0

I run Scrapy from script https://doc.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script to launch a script from AWS Lambda. I compile the project with SAM and everything is correct.

But now, I have the problem with LOG_LEVEL parameter.

def handler(event, context):

  settings = {
             'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36',
             'LOG_ENABLED': True,
             'LOG_LEVEL': 'ERROR'
          }

  process = CrawlerProcess(settings=settings)
  process.crawl(Spider)
  process.start()

When execute this code in local, all is correct, only receive the LOG_LEVEL: ERROR, but when execute this code in AWS Lambda, I receive the LOG_LEVEL: DEBUG, and I don´t know how to resolve.

8
  • The python environment in Lambda has a preconfigured root logger. I think scrapy might be clashing with that, but I'm not 100% sure. Can you try configuring the scrappy logging manually and use scrapy.utils.log.configure_logging(install_root_handler=False) to see if it helps? Commented Jan 21, 2019 at 18:20
  • @MilanCermak I tried these configuration but doesn´t work. I put the line before the settings dict, inside Lambda handler. It is correct? Commented Jan 22, 2019 at 8:44
  • Yes, that should be ok. What is weird to me is that you get even DEBUG messages. Something is definitely "messing" with the logging setup. Maybe try one more thing - in the top level (outside of the Lambda handler), get the root logger root = logging.getLogger() and call the scrappy configure_logging with your settings. HTH. Commented Jan 22, 2019 at 13:10
  • @MilanCermak I tried with root = logging.getLogger() outside of the Lambda handler and scrapy.utils.log.configure_logging(settings=settings) (with install_root_handler True and False) after settings dict and is the same behaviour. It still showing the DEBUG logs. :( Any idea? Commented Jan 22, 2019 at 17:13
  • Whoops, sorry, somehow, my comment above missed the main part I wanted to convey - remove all the handlers from the root logger first, before calling the configure function. Commented Jan 22, 2019 at 17:18

1 Answer 1

2

Based on the input from the OP in Scrapy issue #3587, it turns out AWS Lambda installs its own handlers on the root logger, so you need to remove those handlers before you use Scrapy:

from logging import getLogger

getLogger().handlers = []

def handler(event, context):  # AWS Lambda entry point
    pass  # Your code to call Scrapy.
Sign up to request clarification or add additional context in comments.

6 Comments

Yes, the OP of Scrapy issue is mine, based in @Milan Cermak help. I am waiting that he posted the answer. But your response is correct too. Thanks.
Hi @Gallaecio, I had another problem with AWS Lambda container and Scrapy. When I execute the code in local, doesn't fail but when execute the code in AWS Lambda containers two times in a short period of time, it produce this error: I put in the gist: gist.github.com/milancermak/945b54107d6c238ac079ea2fcd39be29 Thanks for help!
Please, open a separate question so that it can be found easily, both by people who know the answer and by people how may have the same question.
You are right, this is the new issue: stackoverflow.com/questions/54350888/… , thanks for all @Gallaecio
Hey folks. No worries @nicoparsa, feel free to upvote and accept this answer. Glad we found a solution.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.