Avenga AI services help companies create AI and ML solutions at all stages, from pilot to production.
At Defined.ai, the largest marketplace for ethically sourced AI training data, we are thrilled to have added a new Retail Image Dataset to our platform. This dataset focuses on the brick-and-mortar retail domain, primarily from Brazil. It contains almost 8 million images of assorted items on shelves, storefronts, close-ups of price tags, and more.
With this Retail Image Dataset, the builders of AI will be able to train new models to cover copious use cases. The dataset was sourced by Defined.ai’s strict ethical standards, ensuring that all data was obtained with the informed consent of those involved and that privacy rights were respected. We believe ethical sourcing of AI training data is of utmost importance, and we take that responsibility seriously. We are confident that our clients will appreciate the care and attention we put into curating this data.

Why are we so excited about this Retail Image Dataset?
A dataset like this one becoming available is a rare occurrence. For starters, aggregating images from different retail locations across Brazil and organizing them into a cohesive dataset is a complex and time-consuming task. Also, datasets of this nature are valuable and can provide a competitive edge to companies with access to them. As a result, organizations may be reluctant to share such data or make it publicly available.
Defined.ai stands on its principles of ethical AI. Rather than scraping data from the internet, we work hand in hand with our partners to showcase and evangelize their datasets to the world. Because of this, we can proudly offer you the dataset today.
Another approach is to use machine learning algorithms, which can learn to identify and categorize named entities from a large corpus of labelled training data. These algorithms can be trained to recognize a wide range of named entities and can handle complex language, making them a more robust and flexible solution for NER.

What should i consider if I want to crowd source these annotations in an ethical way?
- The annotators should be fully informed about the task and their role in the project, including the type of data being annotated and how it will be used.
- The annotators should be paid fairly for their work and provided with transparent and timely payment.
- The annotators’ privacy and data security should be protected, and they should not be asked to annotate any sensitive or personal information.
Another advantage of open-source ABSA datasets is that they’re widely used and well-known in the natural language processing (NLP) community, meaning that there’s a wealth of information and support available for anyone using these datasets in their own projects. For example, many open-source ABSA datasets come with documentation on how to use the data effectively.
3 Comments:
Charles C. Ragsdale
Friday, April 21, 2023 AT 12:53 AM
Open-source datasets are important for sentiment analysis projects for several reasons. First and foremost, they provide researchers and developers with a common set of data for the development and evaluation of sentiment analysis models. This allows for accurate comparisons between different approaches and helps ensure that progress in the field is being made.
Helen M. Sanchez
Friday, April 21, 2023 AT 12:53 AM
Open-source datasets are important for sentiment analysis projects for several reasons. First and foremost, they provide researchers and developers with a common set of data for the development and evaluation of sentiment analysis models. This allows for accurate comparisons between different approaches and helps ensure that progress in the field is being made.
Nick Leaver
Friday, April 21, 2023 AT 12:53 AM
Open-source datasets are important for sentiment analysis projects for several reasons. First and foremost, they provide researchers and developers with a common set of data for the development and evaluation of sentiment analysis models. This allows for accurate comparisons between different approaches and helps ensure that progress in the field is being made.
Mary J. Benedict
Friday, April 21, 2023 AT 12:53 AM
Open-source datasets are important for sentiment analysis projects for several reasons. First and foremost, they provide researchers and developers with a common set of data for the development and evaluation of sentiment analysis models. This allows for accurate comparisons between different approaches and helps ensure that progress in the field is being made.