Knowledge Base

Best Practices for Querying Rosetta Stone

Querying the Rosetta Stone global data catalog on the Narrative Data Collaboration Platform requires a strategic approach to maximize efficiency and manage data retrieval costs effectively. This guide outlines best practices for querying Rosetta Stone, emphasizing cost control, attribute selection, and query optimization.

Understand Rosetta Stone

The Rosetta Stone global data catalog allows you to query and retrieve data from any company that has both mapped their data to a Rosetta Stone attribute, and provided you with an access rule that allows you to query their data. Your query must adhere to all applicable query policies. Your query may come along with license terms and costs which will need to be adhered to and paid out according to your terms with the provider.

Cost Control Measures

When querying datasets that involve data licensing costs, it's crucial to implement cost control measures to manage your expenditure effectively. Refer to the Cost Controls in NQL Queries document for comprehensive strategies, including the use of CPM filters and budget limits.

Quick Overview of Adding a Budget and a CPM Filter

CPM Filter: Specify the maximum cost per 1000 rows of data to ensure the query retrieves only data within your budget. This filter is applied in the WHERE clause.
Budget Limit: Set a maximum budget for your query using the LIMIT [X] USD PER CALENDAR_MONTH syntax to cap the total cost of data retrieval.

Advisory: Always use the EXPLAIN command before running any query that involves data costs. This allows you to understand the data licensing terms, associated costs, and the estimated data retrieval volume, ensuring there are no surprises in your data expenses.

Attribute Selection and Query Optimization

1. Attribute Selection

Always include at least one attribute in your SELECT clause, excluding system-generated attributes (those prefixed with _). This practice narrows down the data retrieved to your specific requirements, enhancing query efficiency.

2. Attribute Co-occurrence

For queries that yield the most relevant results, include attributes in the SELECT clause that commonly co-occur. This approach leverages Rosetta Stone's structured data organization to improve the relevance and efficiency of your queries.

3. Important Attribute Filtering

Prioritize filtering based on the most crucial attribute(s) in your query. When exploring data without a specific filter in mind, use an IS NOT NULL condition as a safeguard to ensure the query returns meaningful data.

Tips and Tricks for Advanced Querying

Enum Array Adherence

Strictly adhere to the enumerated values for attribute properties when applying filters. These enums ensure you're querying based on valid data values and can be retrieved from the /attribute service, providing a reference for constructing precise queries.

Filtering on Geographic Coordinates

To filter data based on geographic locations effectively, create a bounding box using both < and > operators on latitude and longitude fields. This method is particularly useful for spatial data analyses, allowing you to focus on a specific geographic area.

Example:

SELECT rs."latitude", rs."longitude"
FROM "narrative"."rosetta_stone" AS rs
WHERE rs."latitude" > 40.7128
  AND rs."latitude" < 40.7484
  AND rs."longitude" > -74.0060
  AND rs."longitude" < -73.9352;

This query example creates a bounding box around a specific area, retrieving data only within these geographic constraints.

Conclusion

Adopting these best practices when querying Rosetta Stone datasets ensures efficient and cost-effective data retrieval. By applying strategic filters, managing costs, and optimizing query structure, you can leverage the full potential of Rosetta Stone on the Narrative Data Collaboration Platform, gaining valuable insights while maintaining control over your data expenditures.