About the Role
1. Web Scraping Development:
• Design, develop, and maintain site-specific scraping scripts using Python.
• Ensure scripts are optimized for efficiency and accuracy, minimizing the risk of being blocked or banned.
2. Data Extraction and Transformation:
• Extract relevant data from various websites and transform it into a structured format suitable for database insertion or further analysis.
3. Collaboration:
• Work closely with the data engineering team to understand data needs and requirements.
• Collaborate with the data analytics team to ensure the scraped data meets the analysis criteria.
4. Error Handling:
• Implement robust error handling and logging mechanisms to capture and address any issues during the scraping process.
5. Continuous Learning:
• Stay updated with the latest web scraping techniques, tools, and best practices.
• Adapt to changes in website structures or anti-scraping measures.
6. Documentation:
• Document scraping processes, methodologies, and scripts for future reference and for the benefit of other team members.
Requirements
Qualifications:
• Bachelor's degree in Computer Science or a related field.
• Proven experience in web scraping, preferably using Python.
• Familiarity with web scraping frameworks like Scrapy, BeautifulSoup, or similar tools.
• Strong understanding of web technologies (HTML, CSS, JavaScript) and how they impact scraping.
• Knowledge of databases and SQL.
• Excellent problem-solving skills and attention to detail.
• Ability to work independently and as part of a team.
Preferred Skills:
• Experience with cloud platforms like AWS, Google Cloud, or Azure.
• Familiarity with CAPTCHAs, proxy rotation, and other anti-scraping countermeasures.
• Knowledge of data storage solutions and optimization techniques.
About the Company
Aggregate Intelligence is a platform-agnostic fully integrated BI provider in vertical markets such as travel, retail, real estate and more. Our focus is on the seamless delivery of data intelligence to the user – whether that is in raw data feed form via APIs, in dynamic spreadsheet/platform integrations or in custom-build vertical BI solutions targeting a specific problem for a specific user base.