title | description |
---|---|
Data and ML checks |
ML monitoring "hello world" |
import CloudSignup from '/snippets/cloud_signup.mdx'; import CreateProject from '/snippets/create_project.mdx';
Need help? Ask on [Discord](https://discord.com/invite/xZjKRaNp8b).This quickstart shows both local open-source and cloud workflows.
You will run a simple evaluation in Python and explore results in Evidently Cloud.### 1.1. Set up Evidently Cloud
<CloudSignup />
### 1.2. Installation and imports
Install the Evidently Python library:
```python
!pip install evidently
```
Components to run the evals:
```python
import pandas as pd
from sklearn import datasets
from evidently.future.datasets import Dataset
from evidently.future.datasets import DataDefinition
from evidently.future.report import Report
from evidently.future.metrics import *
from evidently.future.presets import *
from evidently.future.tests import *
```
Components to connect with Evidently Cloud:
```python
from evidently.ui.workspace.cloud import CloudWorkspace
```
### 1.3. Create a Project
<CreateProject />
Install the Evidently Python library:
```python
!pip install evidently
```
Components to run the evals:
```python
import pandas as pd
from sklearn import datasets
from evidently.future.datasets import Dataset
from evidently.future.datasets import DataDefinition
from evidently.future.report import Report
from evidently.future.metrics import *
from evidently.future.presets import *
from evidently.future.tests import *
```
Let's import a toy dataset with tabular data:
adult_data = datasets.fetch_openml(name="adult", version=2, as_frame="auto")
adult = adult_data.frame
url = "https://github.com/evidentlyai/evidently/blob/main/test_data/adults.parquet?raw=true"
adult = pd.read_parquet(url, engine='pyarrow')
Let's split the data into two and introduce some artificial drift for demo purposes. Prod
data will include people with education levels unseen in the reference dataset:
adult_ref = adult[~adult.education.isin(["Some-college", "HS-grad", "Bachelors"])]
adult_prod = adult[adult.education.isin(["Some-college", "HS-grad", "Bachelors"])]
Map the column types:
schema = DataDefinition(
numerical_columns=["education-num", "age", "capital-gain", "hours-per-week", "capital-loss"],
categorical_columns=["education", "occupation", "native-country", "workclass", "marital-status"],
)
Create Evidently Datasets to work with:
eval_data_1 = Dataset.from_pandas(
pd.DataFrame(adult_prod),
data_definition=schema
)
eval_data_2 = Dataset.from_pandas(
pd.DataFrame(adult_ref),
data_definition=schema
)
Let's a summary of all columns in the dataset, and run auto-generated Tests to check for data quality and core statistics between two datasets:
report = Report([
DataSummaryPreset()
],
include_tests="True")
my_eval = report.run(eval_data_1, eval_data_2)
```python
ws.add_run(project.id, my_eval, include_data=False)
```
**View the Report**. Go to [Evidently Cloud](https://app.evidently.cloud/), open your Project, navigate to "Reports" in the left and open the Report. You will see the summary with scores and Test results.
**Get a Dashboard**. As you run repeated evals, you may want to track the results in time. Go to the "Dashboard" tab in the left menu and enter the "Edit" mode. Add a new tab, and select the "Columns" template.
You'll see a set of panels that show column stats. Each has a single data point. As you log ongoing evaluation results, you can track trends and set up alerts.
```python
my_eval
```
This will show the summary Report. In the separate Tab, you'll see the pass/fail results for all Tests.
You can also view the results as a JSON or Python dictionary:
```python
# my_eval.json()
# my_eval.dict()
```
Or save and open an HTML file externally:
```python
# my_report.save_html(“file.html”)
```