Querying Vector Databases: Methods and Practices
Content:
- Introduction
- How are vector databases queried?
- Inserting Data
- Updating Data
- Deleting Data
- Join Operations
- Complex Queries
How are vector databases queried?
This article compares each SQL statement and give the equivalent in vector database terminology and explain how to use each to query a contacts database, as the example.
Vector Databases, unlike traditional relational databases, are not queried using SQL (Structured Query Language) but rather through methods more suited to handling high-dimensional data structures like Vectors.
In vector databases, the focus is often on finding similarities or nearest neighbors in a high-dimensional space, rather than on matching exact scalar values or performing join operations as in SQL databases.

Let's compare common SQL queries with their conceptual equivalents in a vector database when querying a contacts database.
-
Selecting Data
SQL (Relational Database):
SELECT name, email FROM contacts WHERE city = 'New York';
Vector Database:
In Vector Databases, selection might be more about finding similar items. For example, finding contacts similar to a given vector representation (e.g., based on interests encoded as a vector).
python
similar_contacts = vector_database.query_nearest_neighbors( vector_representation_of_interests, number_of_results=10 )
Here, `vector_representation_of_interests` is a vector that represents a specific set of interests, and the query retrieves contacts with similar interests.
-
Inserting Data
SQL:
INSERT INTO contacts (name, email, city) VALUES ('John Doe', 'johndoe@example.com', 'New York');
Vector Database:
In Vector Databases, you often insert data by first converting it to a vector using a model.
python
contact_vector = model.encode('John Doe - johndoe@example.com - New York') vector_database.insert(contact_id, contact_vector)
Here, `model.encode(...)` is a hypothetical function that converts the contact's information into a vector.
-
Updating Data
SQL:
UPDATE contacts SET email = 'newemail@example.com' WHERE name = 'John Doe';
Vector Database:
Updating might involve recalculating the vector for the updated information.
python
new_contact_vector = model.encode('John Doe - newemail@example.com - New York') vector_database.update(contact_id, new_contact_vector)
-
Deleting Data
SQL:
DELETE FROM contacts WHERE name = 'John Doe';
Vector Database:
python
vector_database.delete(contact_id)
Deleting a record in a vector database is straightforward and similar to a SQL database, relying on an identifier rather than a vector.
-
Join Operations
SQL:
SELECT contacts.name, orders.amount
FROM contacts
INNER JOIN orders ON contacts.id = orders.contact_id
WHERE contacts.city = 'New York';
Vector Database:
Join operations as known in SQL don't have a direct equivalent in Vector Databases, since they are more about similarity in a multidimensional space rather than relational data joining.
However, you might perform an operation to find similarities between different sets, such as finding contacts whose interests are similar to certain order characteristics.
-
Complex Queries
SQL:
SELECT name FROM contacts WHERE email LIKE '%@example.com';
Vector Database:
For complex queries like pattern matching in text, you would typically convert the pattern into a vector and find similar Vectors.
python
pattern_vector = model.encode('example.com email pattern')
matching_contacts = vector_database.query_nearest_neighbors(
pattern_vector,
number_of_results=10
) -
Query Language: SQL uses a declarative syntax for data manipulation and retrieval, while vector databases use programming languages (like Python) or specific query APIs tailored to their architecture.
-
Data Handling: Vector Databases are not about exact scalar matches or table joins; they're focused on similarity and proximity in a vector space.
-
Indexing and Retrieval: Vector Databases are built for efficient nearest neighbor searches in high-dimensional spaces, which is different from the indexing and search methodologies in relational databases.
Key Differences
Querying a vector database requires a fundamental shift in thinking from traditional SQL queries. It’s more about understanding and leveraging the mathematical properties of Vectors to find similar items, rather than performing exact matches and joins.