Devops

Embedding and Vector DB

February 8, 2024February 8, 2024 Siva

What is Embedding?

Embedding refers to a numerical representation of objects or entities such as words, sentences, or documents, images and audio in a continuous vector space.

Essentially, embeddings enable machine learning models to find similar objects. Given a photo or a document, a machine learning model that uses embeddings could find a similar photo or document. Since embeddings make it possible for computers to understand the relationships between words and other objects, they are foundational for AI.

Word	Embedding
cat	[0.2, 0.4, -0.1, 0.8, 0.5]
dog	[0.3, 0.6, -0.2, 0.7, 0.4]

The embeddings are designed such that words with similar meanings or contexts have embeddings that are close together in the embedding space. In this example, “cat” and “dog” might have embeddings that are relatively close to each other because they often appear in similar contexts, such as “pets” or “animals”.

What is a vector in machine learning?

In mathematics, a vector is an array of numbers that define a point in a dimensional space. In more practical terms, a vector is a list of numbers — like {1989, 22, 9, 180}. Each number indicates where the object is along a specified dimension.

In machine learning, the use of vectors makes it possible to search for similar objects. A vector-searching algorithm simply has to find two vectors that are close together in a vector database.

To store theses vector embedding we need DB , below are the few Databases that can store embeddings.

Few Vector DB

Pinecone
Chroma DB – opensource Free
Deep lake
SingleStore

Dot Product:

The dot product is a mathematical operation that measures the similarity between two vectors in a vector space. In the context of embeddings, such as word embeddings or sentence embeddings, the dot product can help in searching for similar items by quantifying the similarity between their embedded representations.

Tools used for Demo

Hugging Face:

all-MiniLM-L6-v2

This is a sentence-transformers model: It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for tasks like clustering or semantic search.

Sign Up/Login: Go to the Hugging Face website (https://huggingface.co/) and sign up for an account if you don’t already have one. If you have an account, log in using your credentials.

Access Account Settings: Once you are logged in, click on your profile icon at the top right corner of the page. Then, select “Settings” from the dropdown menu.

Generate API Key: In the “Settings” page, navigate to the “API keys” section. Here, you should see an option to generate a new API key.

Chroma DB:

Chroma is a database for building AI applications with embeddings. It comes with everything you need to get started built in, and runs on your machine.

Install:

pip install chromadb

create chroma db client :

chromadb.client(setting(chroma_db_impl=”duckdb+parquet“,persist_directory=”db/”))

DuckDB: DuckDB refers to the DuckDB database engine. DuckDB is an in-memory analytical database system designed for fast query processing and efficient data storage, as mentioned earlier.

Parquet: Parquet is a columnar storage file format commonly used in big data processing frameworks like Apache Hadoop and Apache Spark. Parquet is designed for efficient storage and processing of large datasets, especially in distributed environments. It provides features such as efficient compression, encoding, and column pruning, making it suitable for analytics workloads.

persist_directory ; Directory to store data

Create collection: Collection is like table in SQL server

collection = chroma_client.create_collection(name=”my_collection”)

Query Collection :

Chroma collections can be queried in a variety of ways, using the .query method.

You can query by a set of query_embeddings, The query will return the n_results closest matches to each query_embedding, in order.

Distance:

Distance is a metric used to measure the similarity or dissimilarity between data points or embedding, when performing a query on a collection.

# Import necessary modules and classes from ChromaDB library

import chromadb

from chromadb.config import Settings

from chromadb.utils import embedding_functions

# Initialize the Hugging Face Embedding Function with the specified model and API key

hf_ef = embedding_functions.HuggingFaceEmbeddingFunction(

model_name=”sentence-transformers/all-MiniLM-L6-v2″,

api_key=”<<<api Key>>>”

)

# Create a ChromaDB client with specified settings

client = chromadb.Client(Settings(

chroma_db_impl=”duckdb+parquet”,

persist_directory=”db/”

))

# Get or create a collection named “goodlog” using the ChromaDB client

coll = client.get_or_create_collection(“goodlog”)

# Define a function to read a log file and add its embeddings to the collection

def read_log_file(filename, addToColl):

indexx = 0

try:

with open(filename, ‘r’) as file:

# Read each line in the file

for line in file:

# Process each log line

indexx += 1

process_log_line(line.strip(), indexx, addToColl)

except FileNotFoundError:

print(f”Error: Log file ‘{filename}’ not found.”)

# Define a function to process each log line and add its embeddings to the collection,

# or query for anomalies based on provided criteria

def process_log_line(log_line, linenumber, addToColl):

if addToColl == 1:

# Add log line embeddings to the collection

logline = hf_ef([log_line])

coll.add(embeddings=logline, documents=log_line, ids=str(linenumber))

elif addToColl == 2:

# Query for anomalies and print them if the distance is greater than 0.5

query_vector = hf_ef([log_line])

res = coll.query(

query_embeddings=query_vector,

n_results=1,

include=[‘distances’],

)

if round(res[‘distances’][0][0], 2) > 0.5:

print(log_line + ” ** this is Anomaly”)

print(res[‘distances’][0][0])

else:

return log_line

# Main program entry point

if __name__ == “__main__”:

# Uncomment the line below to add good log embeddings to the collection

# read_log_file(“./random_log_file.txt”, 1)

# Read log file and search for anomalies

read_log_file(“./random_log_file_error.txt”, 2)

Uncategorized

Enable Kubernetes in Docker Desktop

June 26, 2022 Siva

Switch to Linux container

Enable Kubernetes from settings

Click “Apply & Restart” . This will take few minutes start Kubernetes

Docker

Check Docker container specific logs

October 3, 2018 Siva

While running a docker image with errors, It will look like Container is getting restarted every second.To verify it errors and reason for restart you can check container specific logs @ below path

C:\ProgramData\Docker\containers\<ContainerID>\<ContainerID>-json.log

Azure, Azure Serice Fabric, Docker

Publish docker image to specific Node in Azure service fabric cluster

October 3, 2018 Siva

Update ServiceManifest.xml as below

<ServiceTypes>
<!– This is the name of your ServiceType.
The UseImplicitHost attribute indicates this is a guest service. –>
<StatelessServiceType ServiceTypeName=”<ServiceType>” UseImplicitHost=”true” >
<PlacementConstraints>(NodeTypeName==<NodeName>)</PlacementConstraints>
</StatelessServiceType>
</ServiceTypes>

PlacementConstraints will decide node to deploy,NodeName is the name from your cluster.

Uncategorized

CIM namespace root/Microsoft/Windows/DesiredStateConfiguration is invalid

May 19, 2017May 26, 2017 Siva

Normally even WinRM quickconfig will also say “WinRM service is already running on this machine.” and winmgmt /verifyrepository says “WMI repository is consistent“.

To find whether WMI is properly setup or not use below setups to verify

Open wmimgmt.msc in Run.
WMI Control will be opened.
Right click on and select properties.
In ideal scenario every thing should be good with proper wmi version etc.

if you run get-cimclass it should return all class and name if it return CIM namespace root/Microsoft/Windows/DesiredStateConfiguration is invalid.

Follow below steps to fix issue

Disable and stop the winmgmt service
Remove or rename C:\Windows\System32\wbem\repository
Enable and start the winmgmt service
Open a CMD prompt as Administrator
In the CMD prompt Navigate to C:\Windows\System32\wbem\
Run the following command: in c:\

‘for /f “delims=”%s in (‘dir /b *.mof’) do mofcomp “%s”’

Note: This will take a minute or so to complete.

Now run the command:

‘for /f “delims=” %s in (‘dir /b en-us\*.mfl’) do mofcomp en-us\%s’

Azure, Powershell

Using Azure Key vault for storing secret passwords

May 11, 2017May 11, 2017 Siva

Instead of storing passwords in web.configs or some DB’s it is “the most” secured place to have your secret’s password,in this blog I will explain the process of storing and retiring secrets/password in azure key vaults using Power shell and C#.

Storing passwords in Key Vault as secret

Create Key vault in Azure

Using New-AzureRmKeyVault we have to create vault and add secrets using Set-AzureKeyVaultSecret

New-AzureRmKeyVault -VaultName ‘TestKeyVault’ -ResourceGroupName ‘TestResourceGroup’ -Location ‘East US’

$secret = Set-AzureKeyVaultSecret -VaultName ‘TestKeyVault’ -Name ‘Password’ -SecretValue “P@ssW0rd”

Once your are secret is ready you can start using them in your code or any deployment scripts

you can get the key using powershell cmdlets or using API’s

Using Key vault Name :

Get-AzureKeyVaultSecret cmdlet will bring the current version of your secret.

$value = Get-AzureKeyVaultSecret -VaultName ‘TestKeyVault’ -Name ‘Password’

Using Key vault URL :

To call using URL

function Get-OAuth2Uri
(
[string]$url
)
{
$response = try { Invoke-RestMethod -Method GET -Uri $url -Headers @{} } catch { $_.Exception.Response }
$authHeader = $response.Headers[‘www-authenticate’]
$endpoint = [regex]::match($authHeader, ‘authorization=”(.*?)”‘).Groups[1].Value

return “$endpoint/oauth2/token”
}

function Get-AccessToken
(
[string]$url,
[string]$aadClientId,
[string]$aadClientSecret
)
{
$oath2Uri = Get-OAuth2Uri -url $url

$body = ‘grant_type=client_credentials’
$body += ‘&client_id=’ + $aadClientId
$body += ‘&client_secret=’ + [Uri]::EscapeDataString($aadClientSecret)
$body += ‘&resource=’ + [Uri]::EscapeDataString(“https://vault.azure.net”)

$response = Invoke-RestMethod -Method POST -Uri $oath2Uri -Headers @{} -Body $body

return $response.access_token
}

function Get-Secret
(
[string] $aadClientId,[string] $aadClientSecret,[string] $url
)
{
$accessToken = Get-AccessToken -url $url -aadClientId $aadClientId -aadClientSecret $aadClientSecret
$headers = @{ ‘Authorization’ = “Bearer $accessToken” }

$queryUrl = $url + ‘?api-version=2016-10-01’

$keyResponse = Invoke-RestMethod -Method GET -Uri $queryUrl -Headers $headers

return $keyResponse.value
}

Get-Secret -aadClientId “{ClientId}” -aadClientSecret “{AADSecretID}” -url “https://TestKeyVault.vault.azure.net/secrets/Password”

Using C#

If you are trying to convert keyvault secret to plain text in C# code use below approach

To use Keyvault in C# need to install “Microsoft.Azure.KeyVault” nuget package,use below command to install KeyVault nuget

Install-Package Microsoft.Azure.KeyVault -Version 2.0.6

public static async Task<string> GetToken(string authority, string resource, string scope)
{
var authContext = new AuthenticationContext(authority);
ClientCredential clientCred = new ClientCredential(WebConfigurationManager.AppSettings[“ClientId”],
WebConfigurationManager.AppSettings[“ClientSecret”]);
AuthenticationResult result = await authContext.AcquireTokenAsync(resource, clientCred);

if (result == null)
throw new InvalidOperationException(“Failed to obtain the JWT token”);

return result.AccessToken;
}

private async Task<string> GetKeyVaultValue(string keyVaultUrl)
{
var kv = new KeyVaultClient(GetToken);

var sec = await kv.GetSecretAsync(keyVaultUrl);
return sec.Value;
}

public async Task<string> Get(string keyVaultUrl)
{

return await GetKeyVaultValue(keyVaultUrl);

}

Uncategorized

Updating IIS configuration command line using appcmd.exe

March 7, 2016 Siva

Update using below command (try in Powershell)

E.g: & $Env:WinDir\system32\inetsrv\appcmd.exe set config “Default Web App/SiteName” /section:anonymousAuthentication /enabled:true

On Error :

ERROR ( message:Can not set attribute “enabled” to value “true”.. Reason: This configuration section cannot be used at this path. This happens when the section is locked at a parent level. Locking is either by default (overrideModeDefault=”Deny”), or set explicitly by a location tag with overrideMode=”Deny” or the legacy allowOverride=”false”.. )

Unlock/Override using below Command

Unlocking Security/authentication/anonymousAuthentication

E.g : %appCommand% unlock config -section:security/authentication/anonymousAuthentication

List settings using below command(Section)

E.g & $Env:WinDir\system32\inetsrv\appcmd.exe list config “Default Web Site/SiteName” /section:anonymousAuthentication

IIS Host file location

“C:\Windows\System32\inetsrv\config\applicationHost.config”

Powershell

Convert pfx certificate to base64 string using Powershell

March 5, 2016March 5, 2016 Siva

Use the below script to convert to pfx certificate to convert to Base64 string using Powershell

$fileContentBytes = get-content 'C:\path\to\cert.pfx' -Encoding Byte

[System.Convert]::ToBase64String($fileContentBytes) | Out-File 'pfx-bytes.txt'

Uncategorized

Create Self-signed certificate using SSLCert

March 5, 2016March 5, 2016 Siva

Download SelfSSL :

Download IIS 6.0 Resource Toolkit (includes SelfSSL utility) from Microsoft

Note: SSL will be installed in Windows XP or windows server 2003

SelfSSL will be installed in “C:\Program Files\IIS Resources\SelfSSL”

Use below command to create certificate

selfssl /N:CN=MY.domain.com /V:365

Note : ignore Error “Error opening metabase: 0x80040154”

Open MMC.exe and add certificates snapin

Go to Start > Run (or Windows Key + R) and enter “mmc”. You may receive a UAC prompt, accept it and an empty Management Console will open.

mmc

In the console, go to File > Add/Remove Snap-in.

Add Certificates from the left side

computer

You can see recently created certificate .

mydomain

Uncategorized

Making the Windows 10 Start Menu Work Again

August 12, 2015 Siva

Open Windows PowerShell as Administrator
1. Right click Start button
2. Choose Command prompt (Admin)
3. Key in Powershell in the black window and hit Enter

Paste the following command in the Administrator: Windows PowerShell window and press Enter key:

Get-AppXPackage -AllUsers | Foreach {Add-AppxPackage -DisableDevelopmentMode -Register "$($_.InstallLocation)\AppXManifest.xml"}