Search Posts on Binpipe Blog

How DevOps practices reinforce AI/ML

The DevOps Success Story

Its mid 2020 and software development life cycle has reached an appreciable level of maturity. Two things that stand out now are -

  • DevOps culture & practices have evolved immensely. From version control to build phases, CI/CD, automation tests, deployment orchestration, cloud infrastructure-as-code - all these processes have created a synergy for successful software delivery.

  • The tool ecosystem for developing software applications is remarkably rich. And DevOps processes have revolved around these tools to 'Automate everything humanly possible!'

For the uninitiated, DevOps is defined as "a set of practices intended to reduce the time between committing a change to a system and the change being placed into normal production, while ensuring high quality".

 

DevOps has helped software businesses to succeed by majorly by these three ways -

  • Collaboration: DevOps has helped break silos between software developers and operation engineers. So it's a culture that has promoted "We build it, we run it," not the "Throw the code over the wall" paradigm.

  • Speed: DevOps practices heavily advocate the "Automate Everything" ideology and this leads to faster time to market.

  • Reliability: Use of standardised CI/CD pipelines lead to near zero errors and reproducibility. So things fail fast(which is good, and can be fixed) or do not fail at all!

Enter Artificial Intelligence

In parallel to the rise of Cloud-native software services & DevOps, one more area that is making waves in the technology circles is Artificial Intelligence (AI).

Fathers of AI Minsky and McCarthy, described artificial intelligence as any task performed by a program or a machine that, if a human carried out the same activity, we would say the human had to apply intelligence to accomplish the task.

Now AI has many sub disciplines and helps solve problems for various fields like planning, learning, reasoning, problem solving, knowledge representation, perception, motion, and manipulation and, to a lesser extent, social intelligence and creativity.

 

 

One one hand DevOps is the de-facto standard for application development. However, modern ML (Machine Learning) and AI do not have a standard tooling or process ecosystem. This makes sense for a number of reasons- 

  • AI research was confined to Universities & Labs. They had their own development methodologies including CRISP-DM and Microsoft Team Data Science Process (TDSP).

  • The best practices have not emerged as of now because the tools are changing rapidly and there is a need for a single body of knowledge here.

The below excerpt from Microsoft Azure Blog, throws more light on the topic-

"AI/ML projects Like DevOps, these methodologies are grounded in principles and practices learned from real-world projects. AI/ML teams use an approach unique to data science projects where there are frequent, small iterations to refine the data features, the model, and the analytics question. It's a process intended to align a business problem with AI/ML model development. The release process is not a focus for CRISP-DM or TDSP and there is little interaction with an operations team. DevOps teams (today) are yet not familiar with the tools, languages, and artifacts of data science projects. 

DevOps and AI/ML development are two independent methodologies with a common goal: to put an AI application into production. Today it takes the effort to bridge the gaps between the two approaches. AI/ML projects need to incorporate some of the operational and deployment practices that make DevOps effective and DevOps projects need to accommodate the AI/ML development process to automate the deployment and release process for AI/ML models.

DevOps for AI/ML

DevOps for AI/ML has the potential to stabilize and streamline the model release process. It is often paired with the practice and toolset to support Continuous Integration/Continuous Deployment (CI/CD). Here are some ways to consider CI/CD for AI/ML workstreams:

  • The AI/ML process relies on experimentation and iteration of models and it can take hours or days for a model to train and test. Carve out a separate workflow to accommodate the timelines and artifacts for a model build and test cycle. Avoid gating time-sensitive application builds on AM/ML model builds.

  • For AI/ML teams, think about models as having an expectation to deliver value over time rather than a one-time construction of the model. Adopt practices and processes that plan for and allow a model lifecycle and evolution.

  • DevOps is often characterized as bringing together business, development, release, and operational expertise to deliver a solution. Ensure that AI/ML is represented on feature teams and is included throughout the design, development, and operational sessions.

Establish performance metrics and operational telemetry for AI/ML

Use metrics and telemetry to inform what models will be deployed and updated. Metrics can be standard performance measures like precision, recall, or F1 scores. Or they can be scenario specific measures like the industry-standard fraud metrics developed to inform a fraud manager about a fraud model's performance. Here are some ways to integrate AI/ML metrics into an application solution: 

  • Define model accuracy metrics and track them through model training, validation, testing, and deployment.

  • Define business metrics to capture the business impact of the model in operations. For an example see R notebook for fraud metrics.

  • Capture data metrics, like dataset sizes, volumes, update frequencies, distributions, categories, and data types. Model performance can change unexpectedly for many reasons and it's expedient to know if changes are due to data.

  • Track operational telemetry about the model:  how often is it called? By which applications or gateways? Are there problems? What are the accuracy and usage trends? How much compute or memory does the model consume?

  • Create a model performance dashboard that tracks model versions, performance metrics, and data sets.

AI/ML models need to be updated periodically. Over time, and as new and different data becomes available — or customers or seasons or trends change — a model will need to be re-trained to continue to be effective. Use metrics and telemetry to help refine the update strategy and determine when a model needs to be re-trained.

Automate the end-to-end data and model pipeline

The AI/ML pipeline is an important concept because it connects the necessary tools, processes, and data elements to produce and operationalize an AI/ML model. It also introduces another dimension of complexity for a DevOps process. One of the foundational pillars of DevOps is automation, but automating an end-to-end data and model pipeline is a byzantine integration challenge.

Workstreams in an AI/ML pipeline are typically divided between different teams of experts where each step in the process can be very detailed and intricate. It may not be practical to automate across the entire pipeline because of the difference in requirements, tools, and languages. Identify the steps in the process that can be easily automated like the data transformation scripts, or data and model quality checks. Consider the following workstreams:  

Workstream

Description

Automation

Data Analysis   

Includes data acquisition and focusing on exploring, profiling, cleaning, and transforming. Also includes enriching, and staging data for modeling.

Develop scripts and tests to move and validate the data. Also create scripts to report on the data quality, changes, volume, and consistencies.

Experimentation   

Includes feature engineering, model fitting, and model evaluation.

Develop scripts, tests, and documentation to reproduce the steps and capture model outputs and performance.

Release Process

Includes the process for deploying a model and data pipeline into production.

Integrate the AI/ML pipeline into the release process

Operationalization

Includes capturing operational and performance metrics.

Create operational instrumentation for the AI/ML pipeline. For subsequent model retraining cycles, capture and store model inputs, and outputs.

Model Re-training and Refinement

Determine a cadence for model re-training.

Instrument the AI/ML pipeline with alerts and notifications to trigger retraining.

Visualization

Develop an AI/ML dashboard to centralize information and metrics related to the model and data. Include accuracy, operational characteristics, business impact, history, and versions.

n/a

An automated end-to-end process for the AI/ML pipeline can accelerate development and drive reproducibility, consistency, and efficiency across AI/ML projects."


The Challenges


The problems plaguing AI/ML/Data Scientists is the need of toolchains, automation pipelines, knowledge about standard model training frameworks and ease of hardware access - different teams need different numbers of GPUs , FPGAs , CPUs, TPUs or even IPUs. 


Here are the some of the challenges put out as questions-


  • Who manages and maintains these resources for AI teams?

  • Who administers  hardware resources? 

  • Who prioritizes the jobs?

  • How is the sanity of resource allocations maintained? 

  • Who supports automation scripting and defining pipelines?

  • Who handles security issues, authentication & authorization?

  • Who ensures all the accelerators and nodes are optimized?

  • How to profile slow applications and help the Data Scientists? 

  • Who maintains the toolchains and cloud servers for AI teams?

  • Who maintains any other infrastructure or systems specific issues?

  • So who is the one with the cape? 


DevOps the Superhero!


The answer to all this is again DevOps. But it's not the same DevOps from the Application Development era that would fit in here! This is another beast and needs some more superpowers in addition to its core strengths. Knowledge of newer tools and practices like Kubeflow, Tensorflow, Google ML-Ops, Azure AI pipelines, AWS Sagemaker Studio will be required. And it's high time all this knowledge is aggregated and standardised. I will follow up with more soon, until then enjoy this insightful white-paper from google.ai with some research finding on these lines - https://storage.googleapis.com/pub-tools-public-publication-data/pdf/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf


AWS Certified Machine Learning Specialty

 I must admit that the sense of accomplishment after clearing the "AWS Certified Machine Learning Specialty" exam and the adrenaline rush when you hit the submit button is slightly addictive!

Click to verify!


This one is special because it is my first certification from AWS and being a DevOps Engineer, it would have been easier for me to take the "AWS Certified Solutions Architect" or the "DevOps Engineer" track instead of exploring less familiar terrain of "Machine Learning". However, stepping out of the comfort zone to learn something new made it even more fulfilling.

This post is about the learning path I followed in the run-up to this certification.

Machine Learning in itself is a vast field and this course, kind of scratches the surface and gets you started. 

In summary, you will need to know the following to clear this exam:

  • How to identify the problem (Supervised, Unsupervised, Classification, Regression)

  • How to choose the algorithm (Linear models, CNN, RNN, Tree Ensemble)

  • How to train your model

  • Data preparation & transformation

  • How to use AWS ecosystem to solve the above

Distribution of Questions Asked:

The topic-wise weightage of the questions asked was as follows:

Topic

Weightage

  • Machine Learning

  • Deep Learning

50%

  • AWS SageMaker

25%

  • About AWS Services

25%

  • The total time to complete the exam was 3 hours

  • There were 65 questions asked

Preparing for the exam:

  1. I got started with watching `AWS Tech Talk` and `Deep Dive` videos on Youtube, not just about ML but about related services as well: https://www.youtube.com/channel/UCT-nPlVzJI-ccQXlxjSvJmw

  2. Followed the free training videos and tutorials from AWS (not all of them though): https://aws.amazon.com/training/learning-paths/machine-learning/exam-preparation

  3. ML/DL needs some high school/college level mathematics to be revisited. Basically, Linear Algebra, Probability & Statistics, Multivariable Calculus and Optimization, worked for me.

  4. Data Visualisation using Jupyter notebooks.

  5. Regression and gradient descent.

  6. DL Models - CNN, RNN

  7. Worked on understanding the following concepts-

  1. Supervised, unsupervised and reinforcement learning.

  2. Purpose of training, validation and testing data.

  3. Various ML Algorithms & Model Types-

    1. Logistical Regression

    2. Linear Regression

    3. Support Vector Machines

    4. Decision Trees / Random Forests

    5. K-means Clustering

    6. K-Nearest Neighbours

Once the above concepts are understood go ahead with trying out the following AWS services-


  • SageMaker

  • Rekognition

  • Polly

  • Transcribe

  • Lex

  • Translate

  • Comprehend

  • S3 including how to secure your data

  • Athena including performance

  • Kinesis Firehose and Analytics 

  • Elastic Map Reduce (EMR)

  • AWS Glue

  • QuickSight 


Those of you who regularly use AWS services won't have much of a problem grasping these.


Finally, try practicing a lot of practice exam questions like ones from the link below:


https://www.udemy.com/course/aws-machine-learning-practice-exam/


You should also have a go at the official practice exam before going for the mains. So that was it folks. I am still learning this discipline, and it's all volatile right now. I will feel more confident with ML once I start applying it in some real-world applications. Will write about those experiences as they come by.

My experience speaking at an Amazon Cloud Seminar in Dubai



As of now, I’ve spoken at a number of internal company events and nerd-fests in educational institutions. Most of these have been strictly technical demonstrations of DevOps and Automation projects or purely tutorial gigs. By no means I'm an accomplished or well-known tech speaker. However, I have discovered, the introvert in me disappears once I get on stage! This experience of speaking at a large scale AWS event was very fulfilling, so I decided to pen down the experience and what transpired in the event. Before we get into the nitty-gritty, here are some flashes of the presentation I delivered, unfortunately, I do not have a video recorded version nor good photographs.




Well working as a DevOps Manager at STARZPLAY, I am responsible for the performance and scalability across the core building blocks of STARZPLAY Cloud Platform. Even-though we are hosted on multiple cloud platforms, AWS is where the biggest chunk our services reside. So naturally, we have a lot of interaction with AWS Architects and Engineers with whom we work together to build and improve this platform. So it all began with one of such mail interactions where we were invited for an AWS Containers Day Seminar to share our experiences.

And the schedule looked promising!

As I expressed my desire to join the event, I was approached by the AWS folks to deliver a lecture in the event about how we developed the STARZPLAY Cloud Platform and moved from monolith to microservices based architecture, what were the challenges we faced and how using container and ECS we bailed ourselves out of the performance issues to serve millions of active subscribers assuring service reliability. This was a great opportunity to revisit the architecture, think and document the steps we did and how we evolved over time to build the technology stack we have now. I loved the experience preparing the slides and getting ready for the talk, it helped me re-live the experience and remembering the milestones, failures and successes!



Here are some of the slides that I presented. I am not writing the transcript here, that can be a blog post for another day.



And finally, once the talk was delivered, it garnered very good responses from the hundreds of attendees and I saw a lot of forks on my Github repo after this event and a lot of connections requests on my LinkedIn profile. So lots of networking now in the Dubai DevOps circles! 

I was appreciated by the Senior Technical Account Manager of AWS, Middle East and  North Africa, here's a mail in this regard.



And I was given an Amazon Echo dot as a token of appreciation, thanks folks!

Finally, I thank my DevOps Team at STARZPLAY, Dubai without whom I stand nowhere. And I thank my superiors especially, Saleem BhattiFaraz Arshad and the awesome people at AWS, MENA - ZeidKeerthi and Paul. I look forward to more such sessions in the future.



Clustering (K-Mean) MNIST images of handwritten digits on AWS Sagemaker


SageMaker is an AWS-fully managed service that covers the entire workflow of Machine Learning. Using the SageMaker demo of AWS, we illustrate the most important relationships, basics and functional principles.
For our experiment, we use the MNIST dataset as training data . The Modified National Institute of Standards and Technology ( MNIST) database is a very large database of handwritten digits that is commonly used to train various image processing systems. The database is also widely used for machine learning (ML) training and testing. The dataset was created by "remixing" the samples from the original NIST dataset records .The reason for this is that the makers thought the NIST training dataset was not directly suited for machine learning experiments because it came from American Census Bureau staff and the test dataset from American students. In the MNIST database, the NIST's black-and-white images were normalised to a size of 28 x 28 pixels with anti- aliasing and grayscale values.
The MNIST database of handwritten digits currently includes a training set of 50,000 examples and a test set of 10,000 examples, a subset of the NIST dataset. The MNIST data foundation is well-suited for trying out learning techniques and pattern recognition methods to real data with minimal pre-processing and formatting.
In the following experiment, we set out to do the example listed here - https://github.com/prasanjit-/ml_notebooks/blob/master/kmeans_mnist.ipynb


The high level steps are:

- Prepare training data
- Train a model
- Deploy & validate the model
- Use the result for predictions

Refer to the Jupiter Notebook in the Github repo for detailed steps- https://github.com/prasanjit-/ml_notebooks/blob/master/MNISTDemo.ipynb

The below is a summary of steps to create this training model on Sagemaker:
  1. Create an S3 bucket
Create an S3 bucket to hold the following -
a. The model training data
b. Model artifacts (which Amazon SageMaker generates during model training).
2. Create a Notebook instance
Create a Notebook instance by logging onto: https://console.aws.amazon.com/sagemaker/
3. Create a new conda_python3 notebook
Once created, open the notebook instance and you will be directed to Jupyter Server. At this point create a new conda_python3 notebook.
4. Specify the role
Specify the role and S3 bucket as follows:
from sagemaker import get_execution_role
role = get_execution_role()
bucket=’bucket-name’
5. Download the MNIST dataset
Download the MNIST dataset to the notebook’s memory.
The MNIST database of handwritten digits has a training set of 60,000 examples.
%%time
import pickle, gzip, numpy, urllib.request, json
# Load the dataset
urllib.request.urlretrieve(“http://deeplearning.net/data/mnist/mnist.pkl.gz", “mnist.pkl.gz”)
with gzip.open(‘mnist.pkl.gz’, ‘rb’) as f:
train_set, valid_set, test_set = pickle.load(f, encoding=’latin1')
6. Convert to RecordIO Format
For this example Data needs to be converted to RecordIO format — which is a file format for storing a sequence of records. Records are stored as an unsigned variant specifying the length of the data, and then the data itself as a binary blob.
Algorithms can accept input data from one or more channels. For example, an algorithm might have two channels of input data, training_data and validation_data. The configuration for each channel provides the S3 location where the input data is stored. It also provides information about the stored data: the MIME type, compression method, and whether the data is wrapped in RecordIO format.
Depending on the input mode that the algorithm supports, Amazon SageMaker either copies input data files from an S3 bucket to a local directory in the Docker container, or makes it available as input streams.
Manual Transformation is not needed since we are following Amazon SageMaker’s Highlevel Libraries fit method in this example.
7. Create a training job
In this example we will use the Amazon SageMaker KMeans module.
From SageMaker, import KMeans as follows:
data_location = ‘s3://{}/kmeans_highlevel_example/data’.format(bucket)
output_location = ‘s3://{}/kmeans_example/output’.format(bucket)
print(‘training data will be uploaded to: {}’.format(data_location))
print(‘training artifacts will be uploaded to: {}’.format(output_location))
kmeans = KMeans(role=role,
train_instance_count=2,
train_instance_type=’ml.c4.8xlarge’,
output_path=output_location,
k=10,
data_location=data_location)
  • role — The IAM role that Amazon SageMaker can assume to perform tasks on your behalf (for example, reading training results, called model artifacts, from the S3 bucket and writing training results to Amazon S3).
  • output_path — The S3 location where Amazon SageMaker stores the training results.
  • train_instance_count and train_instance_type — The type and number of ML EC2 compute instances to use for model training.
  • k — The number of clusters to create. For more information, see K-Means Hyperparameters.
  • data_location — The S3 location where the high-level library uploads the transformed training data.
8. Start Model Training
%%time
kmeans.fit(kmeans.record_set(train_set[0]))
9. Deploy a Model
Deploying a model is a 3 step process.
  • Create a Model — CreateModel request is used to provide information such as the location of the S3 bucket that contains your model artifacts and the registry path of the image that contains inference code.
  • Create an Endpoint Configuration — CreateEndpointConfig request is used to provide the resource configuration for hosting. This includes the type and number of ML compute instances to launch for deploying the model.
  • Create an Endpoint — CreateEndpoint request is used to create an endpoint. Amazon SageMaker launches the ML compute instances and deploys the model.
The High Level Python Library deploy method provides all these tasks.
%%time
kmeans_predictor = kmeans.deploy(initial_instance_count=1,
instance_type=’ml.m4.xlarge’)
The sagemaker.amazon.kmeans.KMeans instance knows the registry path of the image that contains the k-means inference code, so you don’t need to provide it.
This is a synchronous operation. The method waits until the deployment completes before returning. It returns a kmeans_predictor.
10. Validate the Model
Here we get an inference for the 30th image of a handwritten number in the valid_set dataset.
result = kmeans_predictor.predict(train_set[0][30:31])
print(result)
The result would show the closest cluster and the distance from that cluster.
This video has a complete demonstration of this experiment.

Below is the set of commands that were executed and the results of the execution:



sagemaker-prasanjit-01
In [1]:
from sagemaker import get_execution_role

role = get_execution_role()
bucket = 'sagemaker-ps-01' # Use the name of your s3 bucket here
In [2]:
role
Out[2]:
'arn:aws:iam::779615490104:role/service-role/AmazonSageMaker-ExecutionRole-20191103T150143'
In [3]:
%%time
import pickle, gzip, numpy, urllib.request, json

# Load the dataset
urllib.request.urlretrieve("http://deeplearning.net/data/mnist/mnist.pkl.gz", "mnist.pkl.gz")
with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
CPU times: user 892 ms, sys: 278 ms, total: 1.17 s
Wall time: 4.6 s
In [6]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (2,10)


def show_digit(img, caption='', subplot=None):
    if subplot == None:
        _, (subplot) = plt.subplots(1,1)
    imgr = img.reshape((28,28))
    subplot.axis('off')
    subplot.imshow(imgr, cmap='gray')
    plt.title(caption)

show_digit(train_set[0][1], 'This is a {}'.format(train_set[1][1]))
In [7]:
from sagemaker import KMeans

data_location = 's3://{}/kmeans_highlevel_example/data'.format(bucket)
output_location = 's3://{}/kmeans_highlevel_example/output'.format(bucket)

print('training data will be uploaded to: {}'.format(data_location))
print('training artifacts will be uploaded to: {}'.format(output_location))

kmeans = KMeans(role=role,
                train_instance_count=2,
                train_instance_type='ml.c4.8xlarge',
                output_path=output_location,
                k=10,
                epochs=100,
                data_location=data_location)
training data will be uploaded to: s3://sagemaker-ps-01/kmeans_highlevel_example/data
training artifacts will be uploaded to: s3://sagemaker-ps-01/kmeans_highlevel_example/output
In [8]:
%%time

kmeans.fit(kmeans.record_set(train_set[0]))
2019-11-03 11:45:01 Starting - Starting the training job...
2019-11-03 11:45:03 Starting - Launching requested ML instances......
2019-11-03 11:46:02 Starting - Preparing the instances for training...
2019-11-03 11:46:43 Downloading - Downloading input data...
2019-11-03 11:47:26 Training - Training image download completed. Training in progress..Docker entrypoint called with argument(s): train
[11/03/2019 11:47:28 INFO 140552810366784] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'_enable_profiler': u'false', u'_tuning_objective_metric': u'', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'_kvstore': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'true', u'epochs': u'1', u'init_method': u'random', u'local_lloyd_tol': u'0.0001', u'local_lloyd_max_iter': u'300', u'_disable_wait_to_read': u'false', u'extra_center_factor': u'auto', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'half_life_time_size': u'0', u'_num_slices': u'1'}
[11/03/2019 11:47:28 INFO 140552810366784] Reading provided configuration from /opt/ml/input/config/hyperparameters.json: {u'epochs': u'100', u'feature_dim': u'784', u'k': u'10', u'force_dense': u'True'}
[11/03/2019 11:47:28 INFO 140552810366784] Final configuration: {u'_tuning_objective_metric': u'', u'extra_center_factor': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'True', u'epochs': u'100', u'feature_dim': u'784', u'local_lloyd_tol': u'0.0001', u'_disable_wait_to_read': u'false', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'_enable_profiler': u'false', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'init_method': u'random', u'half_life_time_size': u'0', u'local_lloyd_max_iter': u'300', u'_kvstore': u'auto', u'k': u'10', u'_num_slices': u'1'}
[11/03/2019 11:47:28 WARNING 140552810366784] Loggers have already been setup.
[11/03/2019 11:47:28 INFO 140552810366784] Environment: {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/dc163b99-1521-4ccb-ad30-92ce3ffc3cce', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_PS_ROOT_PORT': '9000', 'DMLC_NUM_WORKER': '2', 'SAGEMAKER_HTTP_PORT': '8080', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.229.182', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'HOME': '/root', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-208-60.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/4055df43-a805-42f4-8085-40b6d8b6ab74', 'DMLC_ROLE': 'worker', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
Process 1 is a worker.
[11/03/2019 11:47:28 INFO 140552810366784] Using default worker.
[11/03/2019 11:47:28 INFO 140552810366784] Loaded iterator creator application/x-recordio-protobuf for content type ('application/x-recordio-protobuf', '1.0')
[11/03/2019 11:47:28 INFO 140552810366784] Create Store: dist_async
Docker entrypoint called with argument(s): train
[11/03/2019 11:47:29 INFO 140169171593024] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'_enable_profiler': u'false', u'_tuning_objective_metric': u'', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'_kvstore': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'true', u'epochs': u'1', u'init_method': u'random', u'local_lloyd_tol': u'0.0001', u'local_lloyd_max_iter': u'300', u'_disable_wait_to_read': u'false', u'extra_center_factor': u'auto', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'half_life_time_size': u'0', u'_num_slices': u'1'}
[11/03/2019 11:47:29 INFO 140169171593024] Reading provided configuration from /opt/ml/input/config/hyperparameters.json: {u'epochs': u'100', u'feature_dim': u'784', u'k': u'10', u'force_dense': u'True'}
[11/03/2019 11:47:29 INFO 140169171593024] Final configuration: {u'_tuning_objective_metric': u'', u'extra_center_factor': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'True', u'epochs': u'100', u'feature_dim': u'784', u'local_lloyd_tol': u'0.0001', u'_disable_wait_to_read': u'false', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'_enable_profiler': u'false', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'init_method': u'random', u'half_life_time_size': u'0', u'local_lloyd_max_iter': u'300', u'_kvstore': u'auto', u'k': u'10', u'_num_slices': u'1'}
[11/03/2019 11:47:29 WARNING 140169171593024] Loggers have already been setup.
[11/03/2019 11:47:29 INFO 140169171593024] Launching parameter server for role scheduler
[11/03/2019 11:47:29 INFO 140169171593024] {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/86d7c856-2158-4dd0-a0f9-7e34716c8d05', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-229-182.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/05ea5ad8-333f-415c-981c-e8b507b70f15', 'PWD': '/', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
[11/03/2019 11:47:29 INFO 140169171593024] envs={'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/86d7c856-2158-4dd0-a0f9-7e34716c8d05', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_NUM_WORKER': '2', 'DMLC_PS_ROOT_PORT': '9000', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.229.182', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-229-182.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/05ea5ad8-333f-415c-981c-e8b507b70f15', 'DMLC_ROLE': 'scheduler', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
[11/03/2019 11:47:29 INFO 140169171593024] Launching parameter server for role server
[11/03/2019 11:47:29 INFO 140169171593024] {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/86d7c856-2158-4dd0-a0f9-7e34716c8d05', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-229-182.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/05ea5ad8-333f-415c-981c-e8b507b70f15', 'PWD': '/', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
[11/03/2019 11:47:29 INFO 140169171593024] envs={'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/86d7c856-2158-4dd0-a0f9-7e34716c8d05', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_NUM_WORKER': '2', 'DMLC_PS_ROOT_PORT': '9000', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.229.182', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-229-182.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/05ea5ad8-333f-415c-981c-e8b507b70f15', 'DMLC_ROLE': 'server', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
[11/03/2019 11:47:29 INFO 140169171593024] Environment: {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/86d7c856-2158-4dd0-a0f9-7e34716c8d05', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_PS_ROOT_PORT': '9000', 'DMLC_NUM_WORKER': '2', 'SAGEMAKER_HTTP_PORT': '8080', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.229.182', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'HOME': '/root', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-229-182.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/05ea5ad8-333f-415c-981c-e8b507b70f15', 'DMLC_ROLE': 'worker', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
Process 109 is a shell:scheduler.
Process 118 is a shell:server.
Process 1 is a worker.
[11/03/2019 11:47:29 INFO 140169171593024] Using default worker.
[11/03/2019 11:47:29 INFO 140169171593024] Loaded iterator creator application/x-recordio-protobuf for content type ('application/x-recordio-protobuf', '1.0')
[11/03/2019 11:47:29 INFO 140169171593024] Create Store: dist_async
[11/03/2019 11:47:30 INFO 140552810366784] nvidia-smi took: 0.0252320766449 secs to identify 0 gpus
[11/03/2019 11:47:30 INFO 140552810366784] Number of GPUs being used: 0
[11/03/2019 11:47:30 INFO 140552810366784] Setting up with params: {u'_tuning_objective_metric': u'', u'extra_center_factor': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'True', u'epochs': u'100', u'feature_dim': u'784', u'local_lloyd_tol': u'0.0001', u'_disable_wait_to_read': u'false', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'_enable_profiler': u'false', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'init_method': u'random', u'half_life_time_size': u'0', u'local_lloyd_max_iter': u'300', u'_kvstore': u'auto', u'k': u'10', u'_num_slices': u'1'}
[11/03/2019 11:47:30 INFO 140552810366784] 'extra_center_factor' was set to 'auto', evaluated to 10.
[11/03/2019 11:47:30 INFO 140552810366784] Number of GPUs being used: 0
[11/03/2019 11:47:30 INFO 140552810366784] number of center slices 1
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Number of Batches Since Last Reset": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Number of Records Since Last Reset": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Total Batches Seen": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Total Records Seen": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Max Records Seen Between Resets": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Reset Count": {"count": 1, "max": 0, "sum": 0.0, "min": 0}}, "EndTime": 1572781650.394244, "Dimensions": {"Host": "algo-2", "Meta": "init_train_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781650.394209}

[2019-11-03 11:47:30.417] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 0, "duration": 87, "num_examples": 1, "num_bytes": 15820000}
[2019-11-03 11:47:30.596] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 1, "duration": 178, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:30 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:30 INFO 140552810366784] #progress_metric: host=algo-2, completed 1 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Total Records Seen": {"count": 1, "max": 30000, "sum": 30000.0, "min": 30000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1572781650.596894, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 0}, "StartTime": 1572781650.417194}

[11/03/2019 11:47:30 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=139020.312597 records/second
[11/03/2019 11:47:30 INFO 140169171593024] nvidia-smi took: 0.025279045105 secs to identify 0 gpus
[11/03/2019 11:47:30 INFO 140169171593024] Number of GPUs being used: 0
[11/03/2019 11:47:30 INFO 140169171593024] Setting up with params: {u'_tuning_objective_metric': u'', u'extra_center_factor': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'True', u'epochs': u'100', u'feature_dim': u'784', u'local_lloyd_tol': u'0.0001', u'_disable_wait_to_read': u'false', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'_enable_profiler': u'false', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'init_method': u'random', u'half_life_time_size': u'0', u'local_lloyd_max_iter': u'300', u'_kvstore': u'auto', u'k': u'10', u'_num_slices': u'1'}
[11/03/2019 11:47:30 INFO 140169171593024] 'extra_center_factor' was set to 'auto', evaluated to 10.
[11/03/2019 11:47:30 INFO 140169171593024] Number of GPUs being used: 0
[11/03/2019 11:47:30 INFO 140169171593024] number of center slices 1
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Number of Batches Since Last Reset": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Number of Records Since Last Reset": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Total Batches Seen": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Total Records Seen": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Max Records Seen Between Resets": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Reset Count": {"count": 1, "max": 0, "sum": 0.0, "min": 0}}, "EndTime": 1572781650.390149, "Dimensions": {"Host": "algo-1", "Meta": "init_train_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781650.390114}

[2019-11-03 11:47:30.413] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 0, "duration": 88, "num_examples": 1, "num_bytes": 15820000}
[2019-11-03 11:47:30.610] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 1, "duration": 196, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:30 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:30 INFO 140169171593024] #progress_metric: host=algo-1, completed 1 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Total Records Seen": {"count": 1, "max": 30000, "sum": 30000.0, "min": 30000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1572781650.611, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 0}, "StartTime": 1572781650.413488}

[11/03/2019 11:47:30 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=126474.646596 records/second
[2019-11-03 11:47:30.732] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 3, "duration": 120, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:30 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:30 INFO 140169171593024] #progress_metric: host=algo-1, completed 2 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 11, "sum": 11.0, "min": 11}, "Total Records Seen": {"count": 1, "max": 55000, "sum": 55000.0, "min": 55000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 2, "sum": 2.0, "min": 2}}, "EndTime": 1572781650.732486, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 1}, "StartTime": 1572781650.611256}

[11/03/2019 11:47:30 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=206017.191414 records/second
[2019-11-03 11:47:30.853] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 5, "duration": 120, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:30 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:30 INFO 140169171593024] #progress_metric: host=algo-1, completed 3 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 16, "sum": 16.0, "min": 16}, "Total Records Seen": {"count": 1, "max": 80000, "sum": 80000.0, "min": 80000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 3, "sum": 3.0, "min": 3}}, "EndTime": 1572781650.854186, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 2}, "StartTime": 1572781650.732736}

[11/03/2019 11:47:30 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=205620.877095 records/second
[2019-11-03 11:47:30.962] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 7, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:30 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:30 INFO 140169171593024] #progress_metric: host=algo-1, completed 4 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 21, "sum": 21.0, "min": 21}, "Total Records Seen": {"count": 1, "max": 105000, "sum": 105000.0, "min": 105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 4, "sum": 4.0, "min": 4}}, "EndTime": 1572781650.96329, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 3}, "StartTime": 1572781650.856089}

[11/03/2019 11:47:30 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=232952.697479 records/second
[2019-11-03 11:47:31.061] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 9, "duration": 97, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 5 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 26, "sum": 26.0, "min": 26}, "Total Records Seen": {"count": 1, "max": 130000, "sum": 130000.0, "min": 130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 5, "sum": 5.0, "min": 5}}, "EndTime": 1572781651.061609, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 4}, "StartTime": 1572781650.963495}

[11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=254481.560222 records/second
[2019-11-03 11:47:31.176] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 11, "duration": 114, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 6 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 31, "sum": 31.0, "min": 31}, "Total Records Seen": {"count": 1, "max": 155000, "sum": 155000.0, "min": 155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 6, "sum": 6.0, "min": 6}}, "EndTime": 1572781651.177087, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 5}, "StartTime": 1572781651.061859}

[11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=216692.257301 records/second
[2019-11-03 11:47:30.711] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 3, "duration": 114, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:30 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:30 INFO 140552810366784] #progress_metric: host=algo-2, completed 2 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 11, "sum": 11.0, "min": 11}, "Total Records Seen": {"count": 1, "max": 55000, "sum": 55000.0, "min": 55000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 2, "sum": 2.0, "min": 2}}, "EndTime": 1572781650.712005, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 1}, "StartTime": 1572781650.597101}

[11/03/2019 11:47:30 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=217345.775486 records/second
[2019-11-03 11:47:30.825] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 5, "duration": 113, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:30 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:30 INFO 140552810366784] #progress_metric: host=algo-2, completed 3 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 16, "sum": 16.0, "min": 16}, "Total Records Seen": {"count": 1, "max": 80000, "sum": 80000.0, "min": 80000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 3, "sum": 3.0, "min": 3}}, "EndTime": 1572781650.826047, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 2}, "StartTime": 1572781650.712255}

[11/03/2019 11:47:30 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=219428.877551 records/second
[2019-11-03 11:47:30.942] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 7, "duration": 114, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:30 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:30 INFO 140552810366784] #progress_metric: host=algo-2, completed 4 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 21, "sum": 21.0, "min": 21}, "Total Records Seen": {"count": 1, "max": 105000, "sum": 105000.0, "min": 105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 4, "sum": 4.0, "min": 4}}, "EndTime": 1572781650.943047, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 3}, "StartTime": 1572781650.826549}

[11/03/2019 11:47:30 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=214336.728541 records/second
[2019-11-03 11:47:31.046] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 9, "duration": 102, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 5 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 26, "sum": 26.0, "min": 26}, "Total Records Seen": {"count": 1, "max": 130000, "sum": 130000.0, "min": 130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 5, "sum": 5.0, "min": 5}}, "EndTime": 1572781651.046523, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 4}, "StartTime": 1572781650.943299}

[11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=241870.421288 records/second
[2019-11-03 11:47:31.143] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 11, "duration": 96, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 6 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 31, "sum": 31.0, "min": 31}, "Total Records Seen": {"count": 1, "max": 155000, "sum": 155000.0, "min": 155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 6, "sum": 6.0, "min": 6}}, "EndTime": 1572781651.144019, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 5}, "StartTime": 1572781651.046998}

[11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=257288.957374 records/second
[2019-11-03 11:47:31.244] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 13, "duration": 99, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 7 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 36, "sum": 36.0, "min": 36}, "Total Records Seen": {"count": 1, "max": 180000, "sum": 180000.0, "min": 180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 7, "sum": 7.0, "min": 7}}, "EndTime": 1572781651.244924, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 6}, "StartTime": 1572781651.144272}

[11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=248028.100718 records/second
[2019-11-03 11:47:31.344] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 15, "duration": 99, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 8 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 41, "sum": 41.0, "min": 41}, "Total Records Seen": {"count": 1, "max": 205000, "sum": 205000.0, "min": 205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 8, "sum": 8.0, "min": 8}}, "EndTime": 1572781651.345334, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 7}, "StartTime": 1572781651.245178}

[11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=249264.503124 records/second
[2019-11-03 11:47:31.437] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 17, "duration": 91, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 9 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 46, "sum": 46.0, "min": 46}, "Total Records Seen": {"count": 1, "max": 230000, "sum": 230000.0, "min": 230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 9, "sum": 9.0, "min": 9}}, "EndTime": 1572781651.437796, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 8}, "StartTime": 1572781651.345584}

[11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=270515.787328 records/second
[2019-11-03 11:47:31.544] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 19, "duration": 105, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 10 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 51, "sum": 51.0, "min": 51}, "Total Records Seen": {"count": 1, "max": 255000, "sum": 255000.0, "min": 255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 10, "sum": 10.0, "min": 10}}, "EndTime": 1572781651.54472, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 9}, "StartTime": 1572781651.438118}

[11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=234205.612486 records/second
[2019-11-03 11:47:31.299] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 13, "duration": 120, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 7 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 36, "sum": 36.0, "min": 36}, "Total Records Seen": {"count": 1, "max": 180000, "sum": 180000.0, "min": 180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 7, "sum": 7.0, "min": 7}}, "EndTime": 1572781651.300212, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 6}, "StartTime": 1572781651.179075}

[11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=206112.356017 records/second
[2019-11-03 11:47:31.417] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 15, "duration": 117, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 8 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 41, "sum": 41.0, "min": 41}, "Total Records Seen": {"count": 1, "max": 205000, "sum": 205000.0, "min": 205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 8, "sum": 8.0, "min": 8}}, "EndTime": 1572781651.418261, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 7}, "StartTime": 1572781651.300484}

[11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=212013.425533 records/second
[2019-11-03 11:47:31.537] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 17, "duration": 118, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 9 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 46, "sum": 46.0, "min": 46}, "Total Records Seen": {"count": 1, "max": 230000, "sum": 230000.0, "min": 230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 9, "sum": 9.0, "min": 9}}, "EndTime": 1572781651.537545, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 8}, "StartTime": 1572781651.41851}

[11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=209763.02597 records/second
[2019-11-03 11:47:31.659] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 19, "duration": 119, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 10 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 51, "sum": 51.0, "min": 51}, "Total Records Seen": {"count": 1, "max": 255000, "sum": 255000.0, "min": 255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 10, "sum": 10.0, "min": 10}}, "EndTime": 1572781651.659652, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 9}, "StartTime": 1572781651.539169}

[11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=207255.491823 records/second
[2019-11-03 11:47:31.766] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 21, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 11 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 56, "sum": 56.0, "min": 56}, "Total Records Seen": {"count": 1, "max": 280000, "sum": 280000.0, "min": 280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 11, "sum": 11.0, "min": 11}}, "EndTime": 1572781651.766884, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 10}, "StartTime": 1572781651.66}

[11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=233601.411532 records/second
[2019-11-03 11:47:31.880] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 23, "duration": 113, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 12 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 61, "sum": 61.0, "min": 61}, "Total Records Seen": {"count": 1, "max": 305000, "sum": 305000.0, "min": 305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 12, "sum": 12.0, "min": 12}}, "EndTime": 1572781651.882341, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 11}, "StartTime": 1572781651.767134}

[11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=216771.099282 records/second
[2019-11-03 11:47:32.005] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 25, "duration": 122, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 13 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 66, "sum": 66.0, "min": 66}, "Total Records Seen": {"count": 1, "max": 330000, "sum": 330000.0, "min": 330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 13, "sum": 13.0, "min": 13}}, "EndTime": 1572781652.006303, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 12}, "StartTime": 1572781651.882572}

[11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=201845.642106 records/second
[2019-11-03 11:47:32.126] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 27, "duration": 120, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 14 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 71, "sum": 71.0, "min": 71}, "Total Records Seen": {"count": 1, "max": 355000, "sum": 355000.0, "min": 355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 14, "sum": 14.0, "min": 14}}, "EndTime": 1572781652.12742, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 13}, "StartTime": 1572781652.006544}

[11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=206517.890027 records/second
[2019-11-03 11:47:31.671] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 21, "duration": 126, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 11 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 56, "sum": 56.0, "min": 56}, "Total Records Seen": {"count": 1, "max": 280000, "sum": 280000.0, "min": 280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 11, "sum": 11.0, "min": 11}}, "EndTime": 1572781651.671943, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 10}, "StartTime": 1572781651.544972}

[11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=196678.558433 records/second
[2019-11-03 11:47:31.777] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 23, "duration": 105, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 12 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 61, "sum": 61.0, "min": 61}, "Total Records Seen": {"count": 1, "max": 305000, "sum": 305000.0, "min": 305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 12, "sum": 12.0, "min": 12}}, "EndTime": 1572781651.778138, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 11}, "StartTime": 1572781651.672195}

[11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=235702.324034 records/second
[2019-11-03 11:47:31.885] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 25, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 13 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 66, "sum": 66.0, "min": 66}, "Total Records Seen": {"count": 1, "max": 330000, "sum": 330000.0, "min": 330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 13, "sum": 13.0, "min": 13}}, "EndTime": 1572781651.885934, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 12}, "StartTime": 1572781651.778343}

[11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=232080.829543 records/second
[2019-11-03 11:47:31.995] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 27, "duration": 107, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 14 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 71, "sum": 71.0, "min": 71}, "Total Records Seen": {"count": 1, "max": 355000, "sum": 355000.0, "min": 355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 14, "sum": 14.0, "min": 14}}, "EndTime": 1572781651.996102, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 13}, "StartTime": 1572781651.887734}

[11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=230376.771092 records/second
[2019-11-03 11:47:32.102] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 29, "duration": 103, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 15 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 76, "sum": 76.0, "min": 76}, "Total Records Seen": {"count": 1, "max": 380000, "sum": 380000.0, "min": 380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 15, "sum": 15.0, "min": 15}}, "EndTime": 1572781652.102514, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 14}, "StartTime": 1572781651.998441}

[11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=239888.906428 records/second
[2019-11-03 11:47:32.208] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 31, "duration": 104, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 16 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 81, "sum": 81.0, "min": 81}, "Total Records Seen": {"count": 1, "max": 405000, "sum": 405000.0, "min": 405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 16, "sum": 16.0, "min": 16}}, "EndTime": 1572781652.209094, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 15}, "StartTime": 1572781652.10273}

[11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=234803.482498 records/second
[2019-11-03 11:47:32.323] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 33, "duration": 112, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 17 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 86, "sum": 86.0, "min": 86}, "Total Records Seen": {"count": 1, "max": 430000, "sum": 430000.0, "min": 430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 17, "sum": 17.0, "min": 17}}, "EndTime": 1572781652.324047, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 16}, "StartTime": 1572781652.210925}

[11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=220762.137353 records/second
[2019-11-03 11:47:32.416] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 35, "duration": 90, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 18 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 91, "sum": 91.0, "min": 91}, "Total Records Seen": {"count": 1, "max": 455000, "sum": 455000.0, "min": 455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 18, "sum": 18.0, "min": 18}}, "EndTime": 1572781652.417163, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 17}, "StartTime": 1572781652.325707}

[11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=272922.387384 records/second
[2019-11-03 11:47:32.526] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 37, "duration": 107, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 19 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 96, "sum": 96.0, "min": 96}, "Total Records Seen": {"count": 1, "max": 480000, "sum": 480000.0, "min": 480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 19, "sum": 19.0, "min": 19}}, "EndTime": 1572781652.527196, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 18}, "StartTime": 1572781652.417384}

[11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=227432.165709 records/second
[2019-11-03 11:47:32.626] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 39, "duration": 97, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 20 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 101, "sum": 101.0, "min": 101}, "Total Records Seen": {"count": 1, "max": 505000, "sum": 505000.0, "min": 505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 20, "sum": 20.0, "min": 20}}, "EndTime": 1572781652.62669, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 19}, "StartTime": 1572781652.528697}

[11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=254831.607036 records/second
[2019-11-03 11:47:32.248] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 29, "duration": 120, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 15 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 76, "sum": 76.0, "min": 76}, "Total Records Seen": {"count": 1, "max": 380000, "sum": 380000.0, "min": 380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 15, "sum": 15.0, "min": 15}}, "EndTime": 1572781652.249401, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 14}, "StartTime": 1572781652.127715}

[11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=205183.516847 records/second
[2019-11-03 11:47:32.377] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 31, "duration": 125, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 16 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 81, "sum": 81.0, "min": 81}, "Total Records Seen": {"count": 1, "max": 405000, "sum": 405000.0, "min": 405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 16, "sum": 16.0, "min": 16}}, "EndTime": 1572781652.378822, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 15}, "StartTime": 1572781652.2497}

[11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=193302.289226 records/second
[2019-11-03 11:47:32.496] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 33, "duration": 116, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 17 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 86, "sum": 86.0, "min": 86}, "Total Records Seen": {"count": 1, "max": 430000, "sum": 430000.0, "min": 430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 17, "sum": 17.0, "min": 17}}, "EndTime": 1572781652.496576, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 16}, "StartTime": 1572781652.379179}

[11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=212693.332035 records/second
[2019-11-03 11:47:32.615] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 35, "duration": 118, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 18 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 91, "sum": 91.0, "min": 91}, "Total Records Seen": {"count": 1, "max": 455000, "sum": 455000.0, "min": 455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 18, "sum": 18.0, "min": 18}}, "EndTime": 1572781652.616174, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 17}, "StartTime": 1572781652.496823}

[11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=209236.884482 records/second
[2019-11-03 11:47:32.737] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 37, "duration": 119, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 19 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 96, "sum": 96.0, "min": 96}, "Total Records Seen": {"count": 1, "max": 480000, "sum": 480000.0, "min": 480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 19, "sum": 19.0, "min": 19}}, "EndTime": 1572781652.738183, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 18}, "StartTime": 1572781652.616413}

[11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=205061.132536 records/second
[2019-11-03 11:47:32.857] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 39, "duration": 119, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 20 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 101, "sum": 101.0, "min": 101}, "Total Records Seen": {"count": 1, "max": 505000, "sum": 505000.0, "min": 505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 20, "sum": 20.0, "min": 20}}, "EndTime": 1572781652.858275, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 19}, "StartTime": 1572781652.738444}

[11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=208381.972142 records/second
[2019-11-03 11:47:32.966] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 41, "duration": 107, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 21 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 106, "sum": 106.0, "min": 106}, "Total Records Seen": {"count": 1, "max": 530000, "sum": 530000.0, "min": 530000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 21, "sum": 21.0, "min": 21}}, "EndTime": 1572781652.966966, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 20}, "StartTime": 1572781652.858526}

[11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=229766.962848 records/second
[2019-11-03 11:47:33.074] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 43, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 22 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 111, "sum": 111.0, "min": 111}, "Total Records Seen": {"count": 1, "max": 555000, "sum": 555000.0, "min": 555000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 22, "sum": 22.0, "min": 22}}, "EndTime": 1572781653.075226, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 21}, "StartTime": 1572781652.967602}

[11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=231852.986895 records/second
[2019-11-03 11:47:33.183] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 45, "duration": 105, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 23 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 116, "sum": 116.0, "min": 116}, "Total Records Seen": {"count": 1, "max": 580000, "sum": 580000.0, "min": 580000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 23, "sum": 23.0, "min": 23}}, "EndTime": 1572781653.183966, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 22}, "StartTime": 1572781653.075554}

[11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=230295.815882 records/second
[2019-11-03 11:47:32.733] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 41, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 21 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 106, "sum": 106.0, "min": 106}, "Total Records Seen": {"count": 1, "max": 530000, "sum": 530000.0, "min": 530000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 21, "sum": 21.0, "min": 21}}, "EndTime": 1572781652.733428, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 20}, "StartTime": 1572781652.626892}

[11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=234418.188728 records/second
[2019-11-03 11:47:32.852] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 43, "duration": 118, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 22 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 111, "sum": 111.0, "min": 111}, "Total Records Seen": {"count": 1, "max": 555000, "sum": 555000.0, "min": 555000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 22, "sum": 22.0, "min": 22}}, "EndTime": 1572781652.85269, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 21}, "StartTime": 1572781652.73363}

[11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=209777.294079 records/second
[2019-11-03 11:47:32.963] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 45, "duration": 110, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 23 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 116, "sum": 116.0, "min": 116}, "Total Records Seen": {"count": 1, "max": 580000, "sum": 580000.0, "min": 580000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 23, "sum": 23.0, "min": 23}}, "EndTime": 1572781652.963672, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 22}, "StartTime": 1572781652.852899}

[11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=225459.002118 records/second
[2019-11-03 11:47:33.079] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 47, "duration": 114, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 24 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 121, "sum": 121.0, "min": 121}, "Total Records Seen": {"count": 1, "max": 605000, "sum": 605000.0, "min": 605000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 24, "sum": 24.0, "min": 24}}, "EndTime": 1572781653.080286, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 23}, "StartTime": 1572781652.963875}

[11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=214553.817697 records/second
[2019-11-03 11:47:33.194] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 49, "duration": 113, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 25 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 126, "sum": 126.0, "min": 126}, "Total Records Seen": {"count": 1, "max": 630000, "sum": 630000.0, "min": 630000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 25, "sum": 25.0, "min": 25}}, "EndTime": 1572781653.194447, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 24}, "StartTime": 1572781653.080736}

[11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=219639.386018 records/second
[2019-11-03 11:47:33.304] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 51, "duration": 109, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 26 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 131, "sum": 131.0, "min": 131}, "Total Records Seen": {"count": 1, "max": 655000, "sum": 655000.0, "min": 655000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 26, "sum": 26.0, "min": 26}}, "EndTime": 1572781653.304679, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 25}, "StartTime": 1572781653.194648}

[11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=226936.503505 records/second
[2019-11-03 11:47:33.408] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 53, "duration": 103, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 27 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 136, "sum": 136.0, "min": 136}, "Total Records Seen": {"count": 1, "max": 680000, "sum": 680000.0, "min": 680000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 27, "sum": 27.0, "min": 27}}, "EndTime": 1572781653.408885, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 26}, "StartTime": 1572781653.305177}

[11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=240719.926538 records/second
[2019-11-03 11:47:33.506] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 55, "duration": 97, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 28 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 141, "sum": 141.0, "min": 141}, "Total Records Seen": {"count": 1, "max": 705000, "sum": 705000.0, "min": 705000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 28, "sum": 28.0, "min": 28}}, "EndTime": 1572781653.507087, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 27}, "StartTime": 1572781653.409142}

[11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=254881.161309 records/second
[2019-11-03 11:47:33.616] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 57, "duration": 108, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 29 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 146, "sum": 146.0, "min": 146}, "Total Records Seen": {"count": 1, "max": 730000, "sum": 730000.0, "min": 730000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 29, "sum": 29.0, "min": 29}}, "EndTime": 1572781653.616641, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 28}, "StartTime": 1572781653.507342}

[11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=228445.939469 records/second
[2019-11-03 11:47:33.285] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 47, "duration": 100, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 24 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 121, "sum": 121.0, "min": 121}, "Total Records Seen": {"count": 1, "max": 605000, "sum": 605000.0, "min": 605000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 24, "sum": 24.0, "min": 24}}, "EndTime": 1572781653.285798, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 23}, "StartTime": 1572781653.184255}

[11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=245898.125588 records/second
[2019-11-03 11:47:33.395] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 49, "duration": 107, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 25 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 126, "sum": 126.0, "min": 126}, "Total Records Seen": {"count": 1, "max": 630000, "sum": 630000.0, "min": 630000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 25, "sum": 25.0, "min": 25}}, "EndTime": 1572781653.395731, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 24}, "StartTime": 1572781653.287535}

[11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=230465.887587 records/second
[2019-11-03 11:47:33.507] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 51, "duration": 111, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 26 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 131, "sum": 131.0, "min": 131}, "Total Records Seen": {"count": 1, "max": 655000, "sum": 655000.0, "min": 655000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 26, "sum": 26.0, "min": 26}}, "EndTime": 1572781653.507964, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 25}, "StartTime": 1572781653.396171}

[11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=223359.803688 records/second
[2019-11-03 11:47:33.614] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 53, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 27 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 136, "sum": 136.0, "min": 136}, "Total Records Seen": {"count": 1, "max": 680000, "sum": 680000.0, "min": 680000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 27, "sum": 27.0, "min": 27}}, "EndTime": 1572781653.615178, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 26}, "StartTime": 1572781653.508207}

[11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=233372.133136 records/second
[2019-11-03 11:47:33.734] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 55, "duration": 119, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 28 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 141, "sum": 141.0, "min": 141}, "Total Records Seen": {"count": 1, "max": 705000, "sum": 705000.0, "min": 705000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 28, "sum": 28.0, "min": 28}}, "EndTime": 1572781653.735415, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 27}, "StartTime": 1572781653.615447}

[11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=208149.086275 records/second
[2019-11-03 11:47:33.843] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 57, "duration": 105, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 29 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 146, "sum": 146.0, "min": 146}, "Total Records Seen": {"count": 1, "max": 730000, "sum": 730000.0, "min": 730000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 29, "sum": 29.0, "min": 29}}, "EndTime": 1572781653.843744, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 28}, "StartTime": 1572781653.737312}

[11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=234438.104777 records/second
[2019-11-03 11:47:33.945] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 59, "duration": 100, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 30 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 151, "sum": 151.0, "min": 151}, "Total Records Seen": {"count": 1, "max": 755000, "sum": 755000.0, "min": 755000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 30, "sum": 30.0, "min": 30}}, "EndTime": 1572781653.946883, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 29}, "StartTime": 1572781653.845438}

[11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=246162.514174 records/second
[2019-11-03 11:47:34.066] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 61, "duration": 119, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 31 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 156, "sum": 156.0, "min": 156}, "Total Records Seen": {"count": 1, "max": 780000, "sum": 780000.0, "min": 780000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 31, "sum": 31.0, "min": 31}}, "EndTime": 1572781654.067035, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 30}, "StartTime": 1572781653.94709}

[11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=208220.592586 records/second
[2019-11-03 11:47:34.171] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 63, "duration": 102, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 32 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 161, "sum": 161.0, "min": 161}, "Total Records Seen": {"count": 1, "max": 805000, "sum": 805000.0, "min": 805000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 32, "sum": 32.0, "min": 32}}, "EndTime": 1572781654.171523, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 31}, "StartTime": 1572781654.068661}

[11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=242749.526574 records/second
[2019-11-03 11:47:33.717] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 59, "duration": 100, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 30 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 151, "sum": 151.0, "min": 151}, "Total Records Seen": {"count": 1, "max": 755000, "sum": 755000.0, "min": 755000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 30, "sum": 30.0, "min": 30}}, "EndTime": 1572781653.71781, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 29}, "StartTime": 1572781653.616888}

[11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=247224.029801 records/second
[2019-11-03 11:47:33.821] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 61, "duration": 103, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 31 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 156, "sum": 156.0, "min": 156}, "Total Records Seen": {"count": 1, "max": 780000, "sum": 780000.0, "min": 780000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 31, "sum": 31.0, "min": 31}}, "EndTime": 1572781653.821933, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 30}, "StartTime": 1572781653.718127}

[11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=240509.56349 records/second
[2019-11-03 11:47:33.916] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 63, "duration": 94, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 32 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 161, "sum": 161.0, "min": 161}, "Total Records Seen": {"count": 1, "max": 805000, "sum": 805000.0, "min": 805000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 32, "sum": 32.0, "min": 32}}, "EndTime": 1572781653.916884, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 31}, "StartTime": 1572781653.822185}

[11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=263612.98335 records/second
[2019-11-03 11:47:34.010] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 65, "duration": 93, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 33 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 166, "sum": 166.0, "min": 166}, "Total Records Seen": {"count": 1, "max": 830000, "sum": 830000.0, "min": 830000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 33, "sum": 33.0, "min": 33}}, "EndTime": 1572781654.011389, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 32}, "StartTime": 1572781653.917124}

[11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=264847.430143 records/second
[2019-11-03 11:47:34.105] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 67, "duration": 93, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 34 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 171, "sum": 171.0, "min": 171}, "Total Records Seen": {"count": 1, "max": 855000, "sum": 855000.0, "min": 855000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 34, "sum": 34.0, "min": 34}}, "EndTime": 1572781654.106247, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 33}, "StartTime": 1572781654.011634}

[11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=263838.502784 records/second
[2019-11-03 11:47:34.205] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 69, "duration": 98, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 35 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 176, "sum": 176.0, "min": 176}, "Total Records Seen": {"count": 1, "max": 880000, "sum": 880000.0, "min": 880000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 35, "sum": 35.0, "min": 35}}, "EndTime": 1572781654.205973, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 34}, "StartTime": 1572781654.106499}

[11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=250973.784295 records/second
[2019-11-03 11:47:34.306] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 71, "duration": 99, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 36 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 181, "sum": 181.0, "min": 181}, "Total Records Seen": {"count": 1, "max": 905000, "sum": 905000.0, "min": 905000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 36, "sum": 36.0, "min": 36}}, "EndTime": 1572781654.306714, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 35}, "StartTime": 1572781654.206226}

[11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=248451.820189 records/second
[2019-11-03 11:47:34.400] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 73, "duration": 91, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 37 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 186, "sum": 186.0, "min": 186}, "Total Records Seen": {"count": 1, "max": 930000, "sum": 930000.0, "min": 930000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 37, "sum": 37.0, "min": 37}}, "EndTime": 1572781654.400918, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 36}, "StartTime": 1572781654.308464}

[11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=269996.163423 records/second
[2019-11-03 11:47:34.509] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 75, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 38 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 191, "sum": 191.0, "min": 191}, "Total Records Seen": {"count": 1, "max": 955000, "sum": 955000.0, "min": 955000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 38, "sum": 38.0, "min": 38}}, "EndTime": 1572781654.50983, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 37}, "StartTime": 1572781654.402811}

[11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=233186.856197 records/second
[2019-11-03 11:47:34.606] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 77, "duration": 94, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 39 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 196, "sum": 196.0, "min": 196}, "Total Records Seen": {"count": 1, "max": 980000, "sum": 980000.0, "min": 980000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 39, "sum": 39.0, "min": 39}}, "EndTime": 1572781654.606595, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 38}, "StartTime": 1572781654.511681}

[11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=263049.548069 records/second
[2019-11-03 11:47:34.280] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 65, "duration": 108, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 33 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 166, "sum": 166.0, "min": 166}, "Total Records Seen": {"count": 1, "max": 830000, "sum": 830000.0, "min": 830000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 33, "sum": 33.0, "min": 33}}, "EndTime": 1572781654.280882, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 32}, "StartTime": 1572781654.171763}

[11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=228820.324144 records/second
[2019-11-03 11:47:34.382] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 67, "duration": 101, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 34 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 171, "sum": 171.0, "min": 171}, "Total Records Seen": {"count": 1, "max": 855000, "sum": 855000.0, "min": 855000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 34, "sum": 34.0, "min": 34}}, "EndTime": 1572781654.3829, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 33}, "StartTime": 1572781654.281128}

[11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=245078.566704 records/second
[2019-11-03 11:47:34.495] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 69, "duration": 111, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 35 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 176, "sum": 176.0, "min": 176}, "Total Records Seen": {"count": 1, "max": 880000, "sum": 880000.0, "min": 880000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 35, "sum": 35.0, "min": 35}}, "EndTime": 1572781654.495952, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 34}, "StartTime": 1572781654.383263}

[11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=221599.585786 records/second
[2019-11-03 11:47:34.602] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 71, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 36 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 181, "sum": 181.0, "min": 181}, "Total Records Seen": {"count": 1, "max": 905000, "sum": 905000.0, "min": 905000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 36, "sum": 36.0, "min": 36}}, "EndTime": 1572781654.603045, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 35}, "StartTime": 1572781654.496192}

[11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=233684.187068 records/second
[2019-11-03 11:47:34.716] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 73, "duration": 112, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 37 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 186, "sum": 186.0, "min": 186}, "Total Records Seen": {"count": 1, "max": 930000, "sum": 930000.0, "min": 930000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 37, "sum": 37.0, "min": 37}}, "EndTime": 1572781654.716717, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 36}, "StartTime": 1572781654.603287}

[11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=220101.342133 records/second
[2019-11-03 11:47:34.835] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 75, "duration": 118, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 38 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 191, "sum": 191.0, "min": 191}, "Total Records Seen": {"count": 1, "max": 955000, "sum": 955000.0, "min": 955000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 38, "sum": 38.0, "min": 38}}, "EndTime": 1572781654.836292, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 37}, "StartTime": 1572781654.716984}

[11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=209318.750287 records/second
[2019-11-03 11:47:34.942] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 77, "duration": 104, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 39 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 196, "sum": 196.0, "min": 196}, "Total Records Seen": {"count": 1, "max": 980000, "sum": 980000.0, "min": 980000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 39, "sum": 39.0, "min": 39}}, "EndTime": 1572781654.942638, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 38}, "StartTime": 1572781654.836537}

[11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=235349.463572 records/second
[2019-11-03 11:47:35.055] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 79, "duration": 110, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 40 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 201, "sum": 201.0, "min": 201}, "Total Records Seen": {"count": 1, "max": 1005000, "sum": 1005000.0, "min": 1005000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 40, "sum": 40.0, "min": 40}}, "EndTime": 1572781655.055605, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 39}, "StartTime": 1572781654.944466}

[11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=224686.993098 records/second
[2019-11-03 11:47:35.176] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 81, "duration": 119, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 41 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 206, "sum": 206.0, "min": 206}, "Total Records Seen": {"count": 1, "max": 1030000, "sum": 1030000.0, "min": 1030000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 41, "sum": 41.0, "min": 41}}, "EndTime": 1572781655.177454, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 40}, "StartTime": 1572781655.057545}

[11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=208179.253468 records/second
[2019-11-03 11:47:34.712] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 79, "duration": 105, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 40 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 201, "sum": 201.0, "min": 201}, "Total Records Seen": {"count": 1, "max": 1005000, "sum": 1005000.0, "min": 1005000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 40, "sum": 40.0, "min": 40}}, "EndTime": 1572781654.712592, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 39}, "StartTime": 1572781654.606835}

[11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=236098.233163 records/second
[2019-11-03 11:47:34.811] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 81, "duration": 98, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 41 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 206, "sum": 206.0, "min": 206}, "Total Records Seen": {"count": 1, "max": 1030000, "sum": 1030000.0, "min": 1030000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 41, "sum": 41.0, "min": 41}}, "EndTime": 1572781654.811942, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 40}, "StartTime": 1572781654.712832}

[11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=251942.229281 records/second
[2019-11-03 11:47:34.908] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 83, "duration": 96, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 42 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 211, "sum": 211.0, "min": 211}, "Total Records Seen": {"count": 1, "max": 1055000, "sum": 1055000.0, "min": 1055000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 42, "sum": 42.0, "min": 42}}, "EndTime": 1572781654.909221, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 41}, "StartTime": 1572781654.812146}

[11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=257259.920411 records/second
[2019-11-03 11:47:35.013] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 85, "duration": 104, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 43 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 216, "sum": 216.0, "min": 216}, "Total Records Seen": {"count": 1, "max": 1080000, "sum": 1080000.0, "min": 1080000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 43, "sum": 43.0, "min": 43}}, "EndTime": 1572781655.014094, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 42}, "StartTime": 1572781654.909413}

[11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=238581.673887 records/second
[2019-11-03 11:47:35.110] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 87, "duration": 95, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 44 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 221, "sum": 221.0, "min": 221}, "Total Records Seen": {"count": 1, "max": 1105000, "sum": 1105000.0, "min": 1105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 44, "sum": 44.0, "min": 44}}, "EndTime": 1572781655.110599, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 43}, "StartTime": 1572781655.01429}

[11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=259303.973233 records/second
[2019-11-03 11:47:35.203] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 89, "duration": 92, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 45 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 226, "sum": 226.0, "min": 226}, "Total Records Seen": {"count": 1, "max": 1130000, "sum": 1130000.0, "min": 1130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 45, "sum": 45.0, "min": 45}}, "EndTime": 1572781655.203846, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 44}, "StartTime": 1572781655.11079}

[11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=268355.078287 records/second
[2019-11-03 11:47:35.301] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 91, "duration": 96, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 46 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 231, "sum": 231.0, "min": 231}, "Total Records Seen": {"count": 1, "max": 1155000, "sum": 1155000.0, "min": 1155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 46, "sum": 46.0, "min": 46}}, "EndTime": 1572781655.301512, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 45}, "StartTime": 1572781655.204047}

[11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=256218.311012 records/second
[2019-11-03 11:47:35.402] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 93, "duration": 100, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 47 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 236, "sum": 236.0, "min": 236}, "Total Records Seen": {"count": 1, "max": 1180000, "sum": 1180000.0, "min": 1180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 47, "sum": 47.0, "min": 47}}, "EndTime": 1572781655.402659, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 46}, "StartTime": 1572781655.301705}

[11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=247390.26318 records/second
[2019-11-03 11:47:35.497] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 95, "duration": 94, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 48 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 241, "sum": 241.0, "min": 241}, "Total Records Seen": {"count": 1, "max": 1205000, "sum": 1205000.0, "min": 1205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 48, "sum": 48.0, "min": 48}}, "EndTime": 1572781655.498125, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 47}, "StartTime": 1572781655.402851}

[11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=262123.030158 records/second
[2019-11-03 11:47:35.597] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 97, "duration": 97, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 49 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 246, "sum": 246.0, "min": 246}, "Total Records Seen": {"count": 1, "max": 1230000, "sum": 1230000.0, "min": 1230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 49, "sum": 49.0, "min": 49}}, "EndTime": 1572781655.597852, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 48}, "StartTime": 1572781655.499737}

[11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=254351.928665 records/second
[2019-11-03 11:47:35.291] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 83, "duration": 111, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 42 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 211, "sum": 211.0, "min": 211}, "Total Records Seen": {"count": 1, "max": 1055000, "sum": 1055000.0, "min": 1055000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 42, "sum": 42.0, "min": 42}}, "EndTime": 1572781655.291625, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 41}, "StartTime": 1572781655.179141}

[11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=221949.970155 records/second
[2019-11-03 11:47:35.408] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 85, "duration": 116, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 43 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 216, "sum": 216.0, "min": 216}, "Total Records Seen": {"count": 1, "max": 1080000, "sum": 1080000.0, "min": 1080000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 43, "sum": 43.0, "min": 43}}, "EndTime": 1572781655.408899, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 42}, "StartTime": 1572781655.291992}

[11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=213604.075804 records/second
[2019-11-03 11:47:35.513] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 87, "duration": 103, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 44 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 221, "sum": 221.0, "min": 221}, "Total Records Seen": {"count": 1, "max": 1105000, "sum": 1105000.0, "min": 1105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 44, "sum": 44.0, "min": 44}}, "EndTime": 1572781655.513764, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 43}, "StartTime": 1572781655.40914}

[11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=238626.738367 records/second
[2019-11-03 11:47:35.637] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 89, "duration": 121, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 45 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 226, "sum": 226.0, "min": 226}, "Total Records Seen": {"count": 1, "max": 1130000, "sum": 1130000.0, "min": 1130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 45, "sum": 45.0, "min": 45}}, "EndTime": 1572781655.637705, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 44}, "StartTime": 1572781655.514016}

[11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=201894.610373 records/second
[2019-11-03 11:47:35.743] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 91, "duration": 105, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 46 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 231, "sum": 231.0, "min": 231}, "Total Records Seen": {"count": 1, "max": 1155000, "sum": 1155000.0, "min": 1155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 46, "sum": 46.0, "min": 46}}, "EndTime": 1572781655.744241, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 45}, "StartTime": 1572781655.637956}

[11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=234933.94992 records/second
[2019-11-03 11:47:35.854] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 93, "duration": 109, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 47 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 236, "sum": 236.0, "min": 236}, "Total Records Seen": {"count": 1, "max": 1180000, "sum": 1180000.0, "min": 1180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 47, "sum": 47.0, "min": 47}}, "EndTime": 1572781655.854611, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 46}, "StartTime": 1572781655.744479}

[11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=226738.744973 records/second
[2019-11-03 11:47:35.963] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 95, "duration": 108, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 48 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 241, "sum": 241.0, "min": 241}, "Total Records Seen": {"count": 1, "max": 1205000, "sum": 1205000.0, "min": 1205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 48, "sum": 48.0, "min": 48}}, "EndTime": 1572781655.96422, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 47}, "StartTime": 1572781655.854849}

[11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=228308.159928 records/second
[2019-11-03 11:47:36.080] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 97, "duration": 116, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 49 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 246, "sum": 246.0, "min": 246}, "Total Records Seen": {"count": 1, "max": 1230000, "sum": 1230000.0, "min": 1230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 49, "sum": 49.0, "min": 49}}, "EndTime": 1572781656.081254, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 48}, "StartTime": 1572781655.964466}

[11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=213826.658999 records/second
[2019-11-03 11:47:36.195] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 99, "duration": 113, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 50 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 251, "sum": 251.0, "min": 251}, "Total Records Seen": {"count": 1, "max": 1255000, "sum": 1255000.0, "min": 1255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 50, "sum": 50.0, "min": 50}}, "EndTime": 1572781656.196875, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 49}, "StartTime": 1572781656.081495}

[11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=216472.608961 records/second
[2019-11-03 11:47:35.695] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 99, "duration": 95, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 50 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 251, "sum": 251.0, "min": 251}, "Total Records Seen": {"count": 1, "max": 1255000, "sum": 1255000.0, "min": 1255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 50, "sum": 50.0, "min": 50}}, "EndTime": 1572781655.69541, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 49}, "StartTime": 1572781655.599632}

[11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=260737.322147 records/second
[2019-11-03 11:47:35.811] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 101, "duration": 114, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 51 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 256, "sum": 256.0, "min": 256}, "Total Records Seen": {"count": 1, "max": 1280000, "sum": 1280000.0, "min": 1280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 51, "sum": 51.0, "min": 51}}, "EndTime": 1572781655.811704, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 50}, "StartTime": 1572781655.697071}

[11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=217885.922078 records/second
[2019-11-03 11:47:35.926] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 103, "duration": 112, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 52 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 261, "sum": 261.0, "min": 261}, "Total Records Seen": {"count": 1, "max": 1305000, "sum": 1305000.0, "min": 1305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 52, "sum": 52.0, "min": 52}}, "EndTime": 1572781655.926947, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 51}, "StartTime": 1572781655.813594}

[11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=220338.142528 records/second
[2019-11-03 11:47:36.025] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 105, "duration": 96, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 53 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 266, "sum": 266.0, "min": 266}, "Total Records Seen": {"count": 1, "max": 1330000, "sum": 1330000.0, "min": 1330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 53, "sum": 53.0, "min": 53}}, "EndTime": 1572781656.026239, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 52}, "StartTime": 1572781655.928995}

[11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=256794.960963 records/second
[2019-11-03 11:47:36.130] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 107, "duration": 102, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 54 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 271, "sum": 271.0, "min": 271}, "Total Records Seen": {"count": 1, "max": 1355000, "sum": 1355000.0, "min": 1355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 54, "sum": 54.0, "min": 54}}, "EndTime": 1572781656.131204, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 53}, "StartTime": 1572781656.027974}

[11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=241924.550851 records/second
[2019-11-03 11:47:36.221] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 109, "duration": 90, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 55 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 276, "sum": 276.0, "min": 276}, "Total Records Seen": {"count": 1, "max": 1380000, "sum": 1380000.0, "min": 1380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 55, "sum": 55.0, "min": 55}}, "EndTime": 1572781656.222293, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 54}, "StartTime": 1572781656.131432}

[11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=274815.754437 records/second
[2019-11-03 11:47:36.316] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 111, "duration": 93, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 56 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 281, "sum": 281.0, "min": 281}, "Total Records Seen": {"count": 1, "max": 1405000, "sum": 1405000.0, "min": 1405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 56, "sum": 56.0, "min": 56}}, "EndTime": 1572781656.317027, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 55}, "StartTime": 1572781656.222511}

[11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=264206.794548 records/second
[2019-11-03 11:47:36.408] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 113, "duration": 89, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 57 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 286, "sum": 286.0, "min": 286}, "Total Records Seen": {"count": 1, "max": 1430000, "sum": 1430000.0, "min": 1430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 57, "sum": 57.0, "min": 57}}, "EndTime": 1572781656.40873, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 56}, "StartTime": 1572781656.318606}

[11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=277057.301919 records/second
[2019-11-03 11:47:36.506] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 115, "duration": 97, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 58 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 291, "sum": 291.0, "min": 291}, "Total Records Seen": {"count": 1, "max": 1455000, "sum": 1455000.0, "min": 1455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 58, "sum": 58.0, "min": 58}}, "EndTime": 1572781656.507134, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 57}, "StartTime": 1572781656.408955}

[11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=254313.064948 records/second
[2019-11-03 11:47:36.611] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 117, "duration": 102, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 59 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 296, "sum": 296.0, "min": 296}, "Total Records Seen": {"count": 1, "max": 1480000, "sum": 1480000.0, "min": 1480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 59, "sum": 59.0, "min": 59}}, "EndTime": 1572781656.61187, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 58}, "StartTime": 1572781656.508761}

[11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=242210.667584 records/second
[2019-11-03 11:47:36.315] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 101, "duration": 118, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 51 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 256, "sum": 256.0, "min": 256}, "Total Records Seen": {"count": 1, "max": 1280000, "sum": 1280000.0, "min": 1280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 51, "sum": 51.0, "min": 51}}, "EndTime": 1572781656.316061, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 50}, "StartTime": 1572781656.19707}

[11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=209839.005013 records/second
[2019-11-03 11:47:36.435] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 103, "duration": 118, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 52 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 261, "sum": 261.0, "min": 261}, "Total Records Seen": {"count": 1, "max": 1305000, "sum": 1305000.0, "min": 1305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 52, "sum": 52.0, "min": 52}}, "EndTime": 1572781656.435687, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 51}, "StartTime": 1572781656.316337}

[11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=209205.157825 records/second
[2019-11-03 11:47:36.550] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 105, "duration": 113, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 53 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 266, "sum": 266.0, "min": 266}, "Total Records Seen": {"count": 1, "max": 1330000, "sum": 1330000.0, "min": 1330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 53, "sum": 53.0, "min": 53}}, "EndTime": 1572781656.550555, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 52}, "StartTime": 1572781656.436036}

[11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=218056.742633 records/second
[2019-11-03 11:47:36.667] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 107, "duration": 116, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 54 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 271, "sum": 271.0, "min": 271}, "Total Records Seen": {"count": 1, "max": 1355000, "sum": 1355000.0, "min": 1355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 54, "sum": 54.0, "min": 54}}, "EndTime": 1572781656.668235, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 53}, "StartTime": 1572781656.550799}

[11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=212650.198033 records/second
[2019-11-03 11:47:36.783] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 109, "duration": 114, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 55 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 276, "sum": 276.0, "min": 276}, "Total Records Seen": {"count": 1, "max": 1380000, "sum": 1380000.0, "min": 1380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 55, "sum": 55.0, "min": 55}}, "EndTime": 1572781656.783669, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 54}, "StartTime": 1572781656.668477}

[11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=216786.336732 records/second
[2019-11-03 11:47:36.893] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 111, "duration": 109, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 56 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 281, "sum": 281.0, "min": 281}, "Total Records Seen": {"count": 1, "max": 1405000, "sum": 1405000.0, "min": 1405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 56, "sum": 56.0, "min": 56}}, "EndTime": 1572781656.894322, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 55}, "StartTime": 1572781656.783985}

[11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=226315.925788 records/second
[2019-11-03 11:47:37.002] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 113, "duration": 107, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 57 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 286, "sum": 286.0, "min": 286}, "Total Records Seen": {"count": 1, "max": 1430000, "sum": 1430000.0, "min": 1430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 57, "sum": 57.0, "min": 57}}, "EndTime": 1572781657.002531, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 56}, "StartTime": 1572781656.894559}

[11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=231273.599887 records/second
[2019-11-03 11:47:37.117] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 115, "duration": 112, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 58 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 291, "sum": 291.0, "min": 291}, "Total Records Seen": {"count": 1, "max": 1455000, "sum": 1455000.0, "min": 1455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 58, "sum": 58.0, "min": 58}}, "EndTime": 1572781657.117718, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 57}, "StartTime": 1572781657.004481}

[11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=220530.454552 records/second
[2019-11-03 11:47:37.229] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 117, "duration": 109, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 59 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 296, "sum": 296.0, "min": 296}, "Total Records Seen": {"count": 1, "max": 1480000, "sum": 1480000.0, "min": 1480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 59, "sum": 59.0, "min": 59}}, "EndTime": 1572781657.229763, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 58}, "StartTime": 1572781657.119333}

[11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=226135.826937 records/second
[2019-11-03 11:47:37.346] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 119, "duration": 114, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 60 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 301, "sum": 301.0, "min": 301}, "Total Records Seen": {"count": 1, "max": 1505000, "sum": 1505000.0, "min": 1505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 60, "sum": 60.0, "min": 60}}, "EndTime": 1572781657.346454, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 59}, "StartTime": 1572781657.231374}

[11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=217004.377024 records/second
[2019-11-03 11:47:37.458] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 121, "duration": 111, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 61 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 306, "sum": 306.0, "min": 306}, "Total Records Seen": {"count": 1, "max": 1530000, "sum": 1530000.0, "min": 1530000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 61, "sum": 61.0, "min": 61}}, "EndTime": 1572781657.459096, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 60}, "StartTime": 1572781657.346694}

[11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=222127.695632 records/second
[2019-11-03 11:47:37.564] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 123, "duration": 105, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 62 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 311, "sum": 311.0, "min": 311}, "Total Records Seen": {"count": 1, "max": 1555000, "sum": 1555000.0, "min": 1555000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 62, "sum": 62.0, "min": 62}}, "EndTime": 1572781657.56538, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 61}, "StartTime": 1572781657.459356}

[11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=235513.85916 records/second
[2019-11-03 11:47:37.674] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 125, "duration": 108, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 63 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 316, "sum": 316.0, "min": 316}, "Total Records Seen": {"count": 1, "max": 1580000, "sum": 1580000.0, "min": 1580000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 63, "sum": 63.0, "min": 63}}, "EndTime": 1572781657.674904, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 62}, "StartTime": 1572781657.565617}

[11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=228475.307499 records/second
[2019-11-03 11:47:37.780] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 127, "duration": 104, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 64 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 321, "sum": 321.0, "min": 321}, "Total Records Seen": {"count": 1, "max": 1605000, "sum": 1605000.0, "min": 1605000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 64, "sum": 64.0, "min": 64}}, "EndTime": 1572781657.780621, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 63}, "StartTime": 1572781657.675158}

[11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=236744.830825 records/second
[2019-11-03 11:47:37.902] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 129, "duration": 121, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 65 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 326, "sum": 326.0, "min": 326}, "Total Records Seen": {"count": 1, "max": 1630000, "sum": 1630000.0, "min": 1630000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 65, "sum": 65.0, "min": 65}}, "EndTime": 1572781657.903117, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 64}, "StartTime": 1572781657.78087}

[11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=204263.409598 records/second
[2019-11-03 11:47:38.008] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 131, "duration": 105, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 66 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 331, "sum": 331.0, "min": 331}, "Total Records Seen": {"count": 1, "max": 1655000, "sum": 1655000.0, "min": 1655000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 66, "sum": 66.0, "min": 66}}, "EndTime": 1572781658.009231, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 65}, "StartTime": 1572781657.90343}

[11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=235951.071548 records/second
[2019-11-03 11:47:38.115] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 133, "duration": 104, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 67 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 336, "sum": 336.0, "min": 336}, "Total Records Seen": {"count": 1, "max": 1680000, "sum": 1680000.0, "min": 1680000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 67, "sum": 67.0, "min": 67}}, "EndTime": 1572781658.115497, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 66}, "StartTime": 1572781658.009521}

[11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=235577.882222 records/second
[2019-11-03 11:47:38.218] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 135, "duration": 102, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 68 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 341, "sum": 341.0, "min": 341}, "Total Records Seen": {"count": 1, "max": 1705000, "sum": 1705000.0, "min": 1705000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 68, "sum": 68.0, "min": 68}}, "EndTime": 1572781658.218867, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 67}, "StartTime": 1572781658.115755}

[11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=242118.947176 records/second
[2019-11-03 11:47:36.720] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 119, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 60 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 301, "sum": 301.0, "min": 301}, "Total Records Seen": {"count": 1, "max": 1505000, "sum": 1505000.0, "min": 1505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 60, "sum": 60.0, "min": 60}}, "EndTime": 1572781656.720562, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 59}, "StartTime": 1572781656.613444}

[11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=233106.50539 records/second
[2019-11-03 11:47:36.829] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 121, "duration": 108, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 61 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 306, "sum": 306.0, "min": 306}, "Total Records Seen": {"count": 1, "max": 1530000, "sum": 1530000.0, "min": 1530000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 61, "sum": 61.0, "min": 61}}, "EndTime": 1572781656.829433, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 60}, "StartTime": 1572781656.720761}

[11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=229819.839565 records/second
[2019-11-03 11:47:36.947] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 123, "duration": 115, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 62 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 311, "sum": 311.0, "min": 311}, "Total Records Seen": {"count": 1, "max": 1555000, "sum": 1555000.0, "min": 1555000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 62, "sum": 62.0, "min": 62}}, "EndTime": 1572781656.94762, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 61}, "StartTime": 1572781656.831255}

[11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=214647.367204 records/second
[2019-11-03 11:47:37.049] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 125, "duration": 99, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 63 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 316, "sum": 316.0, "min": 316}, "Total Records Seen": {"count": 1, "max": 1580000, "sum": 1580000.0, "min": 1580000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 63, "sum": 63.0, "min": 63}}, "EndTime": 1572781657.050018, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 62}, "StartTime": 1572781656.949825}

[11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=249187.496138 records/second
[2019-11-03 11:47:37.145] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 127, "duration": 95, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 64 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 321, "sum": 321.0, "min": 321}, "Total Records Seen": {"count": 1, "max": 1605000, "sum": 1605000.0, "min": 1605000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 64, "sum": 64.0, "min": 64}}, "EndTime": 1572781657.146114, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 63}, "StartTime": 1572781657.050269}

[11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=260495.066231 records/second
[2019-11-03 11:47:37.247] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 129, "duration": 100, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 65 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 326, "sum": 326.0, "min": 326}, "Total Records Seen": {"count": 1, "max": 1630000, "sum": 1630000.0, "min": 1630000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 65, "sum": 65.0, "min": 65}}, "EndTime": 1572781657.247753, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 64}, "StartTime": 1572781657.14635}

[11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=246215.691385 records/second
[2019-11-03 11:47:37.343] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 131, "duration": 95, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 66 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 331, "sum": 331.0, "min": 331}, "Total Records Seen": {"count": 1, "max": 1655000, "sum": 1655000.0, "min": 1655000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 66, "sum": 66.0, "min": 66}}, "EndTime": 1572781657.344179, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 65}, "StartTime": 1572781657.248007}

[11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=259551.727125 records/second
[2019-11-03 11:47:37.451] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 133, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 67 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 336, "sum": 336.0, "min": 336}, "Total Records Seen": {"count": 1, "max": 1680000, "sum": 1680000.0, "min": 1680000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 67, "sum": 67.0, "min": 67}}, "EndTime": 1572781657.451658, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 66}, "StartTime": 1572781657.344442}

[11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=232884.920767 records/second
[2019-11-03 11:47:37.550] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 135, "duration": 98, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 68 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 341, "sum": 341.0, "min": 341}, "Total Records Seen": {"count": 1, "max": 1705000, "sum": 1705000.0, "min": 1705000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 68, "sum": 68.0, "min": 68}}, "EndTime": 1572781657.551124, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 67}, "StartTime": 1572781657.451914}

[11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=251663.4746 records/second
[2019-11-03 11:47:37.652] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 137, "duration": 100, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 69 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 346, "sum": 346.0, "min": 346}, "Total Records Seen": {"count": 1, "max": 1730000, "sum": 1730000.0, "min": 1730000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 69, "sum": 69.0, "min": 69}}, "EndTime": 1572781657.652952, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 68}, "StartTime": 1572781657.551352}

[11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=245806.472786 records/second
[2019-11-03 11:47:37.745] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 139, "duration": 91, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 70 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 351, "sum": 351.0, "min": 351}, "Total Records Seen": {"count": 1, "max": 1755000, "sum": 1755000.0, "min": 1755000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 70, "sum": 70.0, "min": 70}}, "EndTime": 1572781657.745565, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 69}, "StartTime": 1572781657.653178}

[11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=270211.850321 records/second
[2019-11-03 11:47:37.839] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 141, "duration": 93, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 71 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 356, "sum": 356.0, "min": 356}, "Total Records Seen": {"count": 1, "max": 1780000, "sum": 1780000.0, "min": 1780000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 71, "sum": 71.0, "min": 71}}, "EndTime": 1572781657.839737, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 70}, "StartTime": 1572781657.745817}

[11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=265818.946941 records/second
[2019-11-03 11:47:37.946] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 143, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 72 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 361, "sum": 361.0, "min": 361}, "Total Records Seen": {"count": 1, "max": 1805000, "sum": 1805000.0, "min": 1805000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 72, "sum": 72.0, "min": 72}}, "EndTime": 1572781657.94681, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 71}, "StartTime": 1572781657.839979}

[11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=233723.773457 records/second
[2019-11-03 11:47:38.048] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 145, "duration": 100, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 73 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 366, "sum": 366.0, "min": 366}, "Total Records Seen": {"count": 1, "max": 1830000, "sum": 1830000.0, "min": 1830000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 73, "sum": 73.0, "min": 73}}, "EndTime": 1572781658.048629, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 72}, "StartTime": 1572781657.947054}

[11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=245784.578458 records/second
[2019-11-03 11:47:38.153] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 147, "duration": 104, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 74 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 371, "sum": 371.0, "min": 371}, "Total Records Seen": {"count": 1, "max": 1855000, "sum": 1855000.0, "min": 1855000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 74, "sum": 74.0, "min": 74}}, "EndTime": 1572781658.153764, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 73}, "StartTime": 1572781658.048879}

[11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=238061.139932 records/second
[2019-11-03 11:47:38.257] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 149, "duration": 103, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 75 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 376, "sum": 376.0, "min": 376}, "Total Records Seen": {"count": 1, "max": 1880000, "sum": 1880000.0, "min": 1880000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 75, "sum": 75.0, "min": 75}}, "EndTime": 1572781658.25836, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 74}, "StartTime": 1572781658.154126}

[11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=239560.073017 records/second
[2019-11-03 11:47:38.361] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 151, "duration": 101, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 76 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 381, "sum": 381.0, "min": 381}, "Total Records Seen": {"count": 1, "max": 1905000, "sum": 1905000.0, "min": 1905000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 76, "sum": 76.0, "min": 76}}, "EndTime": 1572781658.362082, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 75}, "StartTime": 1572781658.258831}

[11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=241788.993576 records/second
[2019-11-03 11:47:38.461] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 153, "duration": 99, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 77 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 386, "sum": 386.0, "min": 386}, "Total Records Seen": {"count": 1, "max": 1930000, "sum": 1930000.0, "min": 1930000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 77, "sum": 77.0, "min": 77}}, "EndTime": 1572781658.462326, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 76}, "StartTime": 1572781658.362336}

[11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=249697.812534 records/second
[2019-11-03 11:47:38.572] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 155, "duration": 109, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 78 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 391, "sum": 391.0, "min": 391}, "Total Records Seen": {"count": 1, "max": 1955000, "sum": 1955000.0, "min": 1955000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 78, "sum": 78.0, "min": 78}}, "EndTime": 1572781658.573086, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 77}, "StartTime": 1572781658.462574}

[11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=225957.962151 records/second
[2019-11-03 11:47:38.327] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 137, "duration": 107, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 69 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 346, "sum": 346.0, "min": 346}, "Total Records Seen": {"count": 1, "max": 1730000, "sum": 1730000.0, "min": 1730000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 69, "sum": 69.0, "min": 69}}, "EndTime": 1572781658.327902, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 68}, "StartTime": 1572781658.219121}

[11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=229541.126148 records/second
[2019-11-03 11:47:38.434] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 139, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 70 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 351, "sum": 351.0, "min": 351}, "Total Records Seen": {"count": 1, "max": 1755000, "sum": 1755000.0, "min": 1755000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 70, "sum": 70.0, "min": 70}}, "EndTime": 1572781658.435265, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 69}, "StartTime": 1572781658.328147}

[11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=233102.359754 records/second
[2019-11-03 11:47:38.545] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 141, "duration": 109, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 71 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 356, "sum": 356.0, "min": 356}, "Total Records Seen": {"count": 1, "max": 1780000, "sum": 1780000.0, "min": 1780000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 71, "sum": 71.0, "min": 71}}, "EndTime": 1572781658.54583, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 70}, "StartTime": 1572781658.435508}

[11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=226338.885807 records/second
[2019-11-03 11:47:38.658] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 143, "duration": 112, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 72 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 361, "sum": 361.0, "min": 361}, "Total Records Seen": {"count": 1, "max": 1805000, "sum": 1805000.0, "min": 1805000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 72, "sum": 72.0, "min": 72}}, "EndTime": 1572781658.658848, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 71}, "StartTime": 1572781658.546075}

[11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=221397.458284 records/second
[2019-11-03 11:47:38.770] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 145, "duration": 111, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 73 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 366, "sum": 366.0, "min": 366}, "Total Records Seen": {"count": 1, "max": 1830000, "sum": 1830000.0, "min": 1830000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 73, "sum": 73.0, "min": 73}}, "EndTime": 1572781658.771435, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 72}, "StartTime": 1572781658.659151}

[11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=222358.50457 records/second
[2019-11-03 11:47:38.893] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 147, "duration": 119, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 74 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 371, "sum": 371.0, "min": 371}, "Total Records Seen": {"count": 1, "max": 1855000, "sum": 1855000.0, "min": 1855000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 74, "sum": 74.0, "min": 74}}, "EndTime": 1572781658.893545, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 73}, "StartTime": 1572781658.771688}

[11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=204919.669885 records/second
[2019-11-03 11:47:39.014] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 149, "duration": 119, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 75 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 376, "sum": 376.0, "min": 376}, "Total Records Seen": {"count": 1, "max": 1880000, "sum": 1880000.0, "min": 1880000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 75, "sum": 75.0, "min": 75}}, "EndTime": 1572781659.01457, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 74}, "StartTime": 1572781658.893806}

[11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=206791.172816 records/second
[2019-11-03 11:47:39.134] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 151, "duration": 118, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 76 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 381, "sum": 381.0, "min": 381}, "Total Records Seen": {"count": 1, "max": 1905000, "sum": 1905000.0, "min": 1905000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 76, "sum": 76.0, "min": 76}}, "EndTime": 1572781659.134788, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 75}, "StartTime": 1572781659.014817}

[11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=207656.90476 records/second
[2019-11-03 11:47:38.679] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 157, "duration": 105, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 79 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 396, "sum": 396.0, "min": 396}, "Total Records Seen": {"count": 1, "max": 1980000, "sum": 1980000.0, "min": 1980000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 79, "sum": 79.0, "min": 79}}, "EndTime": 1572781658.679666, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 78}, "StartTime": 1572781658.57351}

[11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=235247.558516 records/second
[2019-11-03 11:47:38.783] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 159, "duration": 102, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 80 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 401, "sum": 401.0, "min": 401}, "Total Records Seen": {"count": 1, "max": 2005000, "sum": 2005000.0, "min": 2005000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 80, "sum": 80.0, "min": 80}}, "EndTime": 1572781658.783399, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 79}, "StartTime": 1572781658.68004}

[11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=241622.405081 records/second
[2019-11-03 11:47:38.878] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 161, "duration": 95, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 81 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 406, "sum": 406.0, "min": 406}, "Total Records Seen": {"count": 1, "max": 2030000, "sum": 2030000.0, "min": 2030000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 81, "sum": 81.0, "min": 81}}, "EndTime": 1572781658.879164, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 80}, "StartTime": 1572781658.783612}

[11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=261350.800321 records/second
[2019-11-03 11:47:38.987] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 163, "duration": 108, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 82 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 411, "sum": 411.0, "min": 411}, "Total Records Seen": {"count": 1, "max": 2055000, "sum": 2055000.0, "min": 2055000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 82, "sum": 82.0, "min": 82}}, "EndTime": 1572781658.988273, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 81}, "StartTime": 1572781658.879363}

[11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=229286.649859 records/second
[2019-11-03 11:47:39.083] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 165, "duration": 95, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 83 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 416, "sum": 416.0, "min": 416}, "Total Records Seen": {"count": 1, "max": 2080000, "sum": 2080000.0, "min": 2080000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 83, "sum": 83.0, "min": 83}}, "EndTime": 1572781659.084386, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 82}, "StartTime": 1572781658.988487}

[11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=260354.065798 records/second
[2019-11-03 11:47:39.195] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 167, "duration": 109, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 84 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 421, "sum": 421.0, "min": 421}, "Total Records Seen": {"count": 1, "max": 2105000, "sum": 2105000.0, "min": 2105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 84, "sum": 84.0, "min": 84}}, "EndTime": 1572781659.196119, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 83}, "StartTime": 1572781659.086359}

[11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=227550.123586 records/second
[2019-11-03 11:47:39.306] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 169, "duration": 109, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 85 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 426, "sum": 426.0, "min": 426}, "Total Records Seen": {"count": 1, "max": 2130000, "sum": 2130000.0, "min": 2130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 85, "sum": 85.0, "min": 85}}, "EndTime": 1572781659.306819, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 84}, "StartTime": 1572781659.196313}

[11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=226019.330419 records/second
[2019-11-03 11:47:39.418] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 171, "duration": 111, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 86 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 431, "sum": 431.0, "min": 431}, "Total Records Seen": {"count": 1, "max": 2155000, "sum": 2155000.0, "min": 2155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 86, "sum": 86.0, "min": 86}}, "EndTime": 1572781659.418981, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 85}, "StartTime": 1572781659.307034}

[11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=223112.669583 records/second
[2019-11-03 11:47:39.535] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 173, "duration": 113, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 87 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 436, "sum": 436.0, "min": 436}, "Total Records Seen": {"count": 1, "max": 2180000, "sum": 2180000.0, "min": 2180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 87, "sum": 87.0, "min": 87}}, "EndTime": 1572781659.535613, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 86}, "StartTime": 1572781659.421016}

[11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=217954.308781 records/second
[2019-11-03 11:47:39.653] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 175, "duration": 116, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 88 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 441, "sum": 441.0, "min": 441}, "Total Records Seen": {"count": 1, "max": 2205000, "sum": 2205000.0, "min": 2205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 88, "sum": 88.0, "min": 88}}, "EndTime": 1572781659.654219, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 87}, "StartTime": 1572781659.537496}

[11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=213978.507789 records/second
[2019-11-03 11:47:39.242] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 153, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 77 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 386, "sum": 386.0, "min": 386}, "Total Records Seen": {"count": 1, "max": 1930000, "sum": 1930000.0, "min": 1930000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 77, "sum": 77.0, "min": 77}}, "EndTime": 1572781659.242942, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 76}, "StartTime": 1572781659.135533}

[11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=232437.345108 records/second
[2019-11-03 11:47:39.362] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 155, "duration": 117, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 78 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 391, "sum": 391.0, "min": 391}, "Total Records Seen": {"count": 1, "max": 1955000, "sum": 1955000.0, "min": 1955000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 78, "sum": 78.0, "min": 78}}, "EndTime": 1572781659.36302, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 77}, "StartTime": 1572781659.244852}

[11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=211332.314873 records/second
[2019-11-03 11:47:39.468] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 157, "duration": 104, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 79 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 396, "sum": 396.0, "min": 396}, "Total Records Seen": {"count": 1, "max": 1980000, "sum": 1980000.0, "min": 1980000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 79, "sum": 79.0, "min": 79}}, "EndTime": 1572781659.468665, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 78}, "StartTime": 1572781659.363264}

[11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=236905.829698 records/second
[2019-11-03 11:47:39.579] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 159, "duration": 109, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 80 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 401, "sum": 401.0, "min": 401}, "Total Records Seen": {"count": 1, "max": 2005000, "sum": 2005000.0, "min": 2005000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 80, "sum": 80.0, "min": 80}}, "EndTime": 1572781659.579519, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 79}, "StartTime": 1572781659.468902}

[11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=225727.398758 records/second
[2019-11-03 11:47:39.694] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 161, "duration": 114, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 81 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 406, "sum": 406.0, "min": 406}, "Total Records Seen": {"count": 1, "max": 2030000, "sum": 2030000.0, "min": 2030000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 81, "sum": 81.0, "min": 81}}, "EndTime": 1572781659.695116, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 80}, "StartTime": 1572781659.579765}

[11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=216469.033856 records/second
[2019-11-03 11:47:39.808] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 163, "duration": 112, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 82 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 411, "sum": 411.0, "min": 411}, "Total Records Seen": {"count": 1, "max": 2055000, "sum": 2055000.0, "min": 2055000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 82, "sum": 82.0, "min": 82}}, "EndTime": 1572781659.80906, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 81}, "StartTime": 1572781659.695364}

[11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=219606.72616 records/second
[2019-11-03 11:47:39.924] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 165, "duration": 114, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 83 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 416, "sum": 416.0, "min": 416}, "Total Records Seen": {"count": 1, "max": 2080000, "sum": 2080000.0, "min": 2080000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 83, "sum": 83.0, "min": 83}}, "EndTime": 1572781659.925035, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 82}, "StartTime": 1572781659.809357}

[11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=215857.64515 records/second
[2019-11-03 11:47:40.033] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 167, "duration": 107, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 84 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 421, "sum": 421.0, "min": 421}, "Total Records Seen": {"count": 1, "max": 2105000, "sum": 2105000.0, "min": 2105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 84, "sum": 84.0, "min": 84}}, "EndTime": 1572781660.033847, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 83}, "StartTime": 1572781659.925312}

[11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=230064.900587 records/second
[2019-11-03 11:47:40.160] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 169, "duration": 124, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 85 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 426, "sum": 426.0, "min": 426}, "Total Records Seen": {"count": 1, "max": 2130000, "sum": 2130000.0, "min": 2130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 85, "sum": 85.0, "min": 85}}, "EndTime": 1572781660.160738, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 84}, "StartTime": 1572781660.034084}

[11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=197187.113905 records/second
[2019-11-03 11:47:39.753] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 177, "duration": 97, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 89 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 446, "sum": 446.0, "min": 446}, "Total Records Seen": {"count": 1, "max": 2230000, "sum": 2230000.0, "min": 2230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 89, "sum": 89.0, "min": 89}}, "EndTime": 1572781659.754287, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 88}, "StartTime": 1572781659.656077}

[11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=254284.695766 records/second
[2019-11-03 11:47:39.844] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 179, "duration": 90, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 90 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 451, "sum": 451.0, "min": 451}, "Total Records Seen": {"count": 1, "max": 2255000, "sum": 2255000.0, "min": 2255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 90, "sum": 90.0, "min": 90}}, "EndTime": 1572781659.845219, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 89}, "StartTime": 1572781659.75448}

[11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=275145.303451 records/second
[2019-11-03 11:47:39.953] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 181, "duration": 107, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 91 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 456, "sum": 456.0, "min": 456}, "Total Records Seen": {"count": 1, "max": 2280000, "sum": 2280000.0, "min": 2280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 91, "sum": 91.0, "min": 91}}, "EndTime": 1572781659.953676, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 90}, "StartTime": 1572781659.845461}

[11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=230744.313781 records/second
[2019-11-03 11:47:40.060] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 183, "duration": 104, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 92 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 461, "sum": 461.0, "min": 461}, "Total Records Seen": {"count": 1, "max": 2305000, "sum": 2305000.0, "min": 2305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 92, "sum": 92.0, "min": 92}}, "EndTime": 1572781660.060468, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 91}, "StartTime": 1572781659.955789}

[11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=238536.083788 records/second
[2019-11-03 11:47:40.165] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 185, "duration": 102, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 93 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 466, "sum": 466.0, "min": 466}, "Total Records Seen": {"count": 1, "max": 2330000, "sum": 2330000.0, "min": 2330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 93, "sum": 93.0, "min": 93}}, "EndTime": 1572781660.165692, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 92}, "StartTime": 1572781660.062342}

[11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=241597.353105 records/second
[2019-11-03 11:47:40.269] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 187, "duration": 101, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 94 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 471, "sum": 471.0, "min": 471}, "Total Records Seen": {"count": 1, "max": 2355000, "sum": 2355000.0, "min": 2355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 94, "sum": 94.0, "min": 94}}, "EndTime": 1572781660.269801, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 93}, "StartTime": 1572781660.167789}

[11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=244587.508631 records/second
[2019-11-03 11:47:40.373] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 189, "duration": 101, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 95 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 476, "sum": 476.0, "min": 476}, "Total Records Seen": {"count": 1, "max": 2380000, "sum": 2380000.0, "min": 2380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 95, "sum": 95.0, "min": 95}}, "EndTime": 1572781660.374038, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 94}, "StartTime": 1572781660.271758}

[11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=244131.376699 records/second
[2019-11-03 11:47:40.472] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 191, "duration": 98, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 96 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 481, "sum": 481.0, "min": 481}, "Total Records Seen": {"count": 1, "max": 2405000, "sum": 2405000.0, "min": 2405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 96, "sum": 96.0, "min": 96}}, "EndTime": 1572781660.473244, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 95}, "StartTime": 1572781660.374283}

[11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=252290.783452 records/second
[2019-11-03 11:47:40.584] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 193, "duration": 109, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 97 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 486, "sum": 486.0, "min": 486}, "Total Records Seen": {"count": 1, "max": 2430000, "sum": 2430000.0, "min": 2430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 97, "sum": 97.0, "min": 97}}, "EndTime": 1572781660.585374, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 96}, "StartTime": 1572781660.475085}

[11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=226414.149157 records/second
[2019-11-03 11:47:40.268] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 171, "duration": 107, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 86 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 431, "sum": 431.0, "min": 431}, "Total Records Seen": {"count": 1, "max": 2155000, "sum": 2155000.0, "min": 2155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 86, "sum": 86.0, "min": 86}}, "EndTime": 1572781660.269015, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 85}, "StartTime": 1572781660.160977}

[11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=231130.351597 records/second
[2019-11-03 11:47:40.371] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 173, "duration": 102, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 87 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 436, "sum": 436.0, "min": 436}, "Total Records Seen": {"count": 1, "max": 2180000, "sum": 2180000.0, "min": 2180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 87, "sum": 87.0, "min": 87}}, "EndTime": 1572781660.372365, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 86}, "StartTime": 1572781660.269253}

[11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=242147.462543 records/second
[2019-11-03 11:47:40.486] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 175, "duration": 113, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 88 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 441, "sum": 441.0, "min": 441}, "Total Records Seen": {"count": 1, "max": 2205000, "sum": 2205000.0, "min": 2205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 88, "sum": 88.0, "min": 88}}, "EndTime": 1572781660.487108, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 87}, "StartTime": 1572781660.372609}

[11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=217679.211637 records/second
[2019-11-03 11:47:40.595] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 177, "duration": 106, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 89 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 446, "sum": 446.0, "min": 446}, "Total Records Seen": {"count": 1, "max": 2230000, "sum": 2230000.0, "min": 2230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 89, "sum": 89.0, "min": 89}}, "EndTime": 1572781660.596009, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 88}, "StartTime": 1572781660.489044}

[11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=233424.603864 records/second
[2019-11-03 11:47:40.715] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 179, "duration": 118, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 90 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 451, "sum": 451.0, "min": 451}, "Total Records Seen": {"count": 1, "max": 2255000, "sum": 2255000.0, "min": 2255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 90, "sum": 90.0, "min": 90}}, "EndTime": 1572781660.716018, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 89}, "StartTime": 1572781660.596258}

[11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=208517.47562 records/second
[2019-11-03 11:47:40.854] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 181, "duration": 138, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 91 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 456, "sum": 456.0, "min": 456}, "Total Records Seen": {"count": 1, "max": 2280000, "sum": 2280000.0, "min": 2280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 91, "sum": 91.0, "min": 91}}, "EndTime": 1572781660.855434, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 90}, "StartTime": 1572781660.716272}

[11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=179375.916273 records/second
[2019-11-03 11:47:40.981] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 183, "duration": 124, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 92 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 461, "sum": 461.0, "min": 461}, "Total Records Seen": {"count": 1, "max": 2305000, "sum": 2305000.0, "min": 2305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 92, "sum": 92.0, "min": 92}}, "EndTime": 1572781660.981819, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 91}, "StartTime": 1572781660.855922}

[11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=198300.994743 records/second
[2019-11-03 11:47:41.087] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 185, "duration": 102, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 93 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 466, "sum": 466.0, "min": 466}, "Total Records Seen": {"count": 1, "max": 2330000, "sum": 2330000.0, "min": 2330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 93, "sum": 93.0, "min": 93}}, "EndTime": 1572781661.087967, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 92}, "StartTime": 1572781660.984266}

[11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=240775.75379 records/second
[2019-11-03 11:47:41.189] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 187, "duration": 99, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 94 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 471, "sum": 471.0, "min": 471}, "Total Records Seen": {"count": 1, "max": 2355000, "sum": 2355000.0, "min": 2355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 94, "sum": 94.0, "min": 94}}, "EndTime": 1572781661.189626, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 93}, "StartTime": 1572781661.089645}

[11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=249742.415979 records/second
[2019-11-03 11:47:40.697] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 195, "duration": 110, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 98 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 491, "sum": 491.0, "min": 491}, "Total Records Seen": {"count": 1, "max": 2455000, "sum": 2455000.0, "min": 2455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 98, "sum": 98.0, "min": 98}}, "EndTime": 1572781660.698355, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 97}, "StartTime": 1572781660.587506}

[11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=225269.616479 records/second
[2019-11-03 11:47:40.838] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 197, "duration": 137, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 99 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 496, "sum": 496.0, "min": 496}, "Total Records Seen": {"count": 1, "max": 2480000, "sum": 2480000.0, "min": 2480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 99, "sum": 99.0, "min": 99}}, "EndTime": 1572781660.838742, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 98}, "StartTime": 1572781660.700431}

[11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=180615.201237 records/second
[2019-11-03 11:47:40.938] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 199, "duration": 99, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples
[11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 100 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 501, "sum": 501.0, "min": 501}, "Total Records Seen": {"count": 1, "max": 2505000, "sum": 2505000.0, "min": 2505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 100, "sum": 100.0, "min": 100}}, "EndTime": 1572781660.939309, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 99}, "StartTime": 1572781660.838951}

[11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=248843.324315 records/second
[2019-11-03 11:47:41.293] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 189, "duration": 101, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 95 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 476, "sum": 476.0, "min": 476}, "Total Records Seen": {"count": 1, "max": 2380000, "sum": 2380000.0, "min": 2380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 95, "sum": 95.0, "min": 95}}, "EndTime": 1572781661.294615, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 94}, "StartTime": 1572781661.189859}

[11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=238360.94119 records/second
[2019-11-03 11:47:41.400] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 191, "duration": 105, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 96 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 481, "sum": 481.0, "min": 481}, "Total Records Seen": {"count": 1, "max": 2405000, "sum": 2405000.0, "min": 2405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 96, "sum": 96.0, "min": 96}}, "EndTime": 1572781661.401357, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 95}, "StartTime": 1572781661.294858}

[11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=234471.130948 records/second
[2019-11-03 11:47:41.505] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 193, "duration": 103, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 97 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 486, "sum": 486.0, "min": 486}, "Total Records Seen": {"count": 1, "max": 2430000, "sum": 2430000.0, "min": 2430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 97, "sum": 97.0, "min": 97}}, "EndTime": 1572781661.506414, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 96}, "StartTime": 1572781661.401588}

[11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=238201.746913 records/second
[2019-11-03 11:47:41.607] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 195, "duration": 99, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 98 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 491, "sum": 491.0, "min": 491}, "Total Records Seen": {"count": 1, "max": 2455000, "sum": 2455000.0, "min": 2455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 98, "sum": 98.0, "min": 98}}, "EndTime": 1572781661.608669, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 97}, "StartTime": 1572781661.506654}

[11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=244752.499282 records/second
[2019-11-03 11:47:41.708] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 197, "duration": 99, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 99 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 496, "sum": 496.0, "min": 496}, "Total Records Seen": {"count": 1, "max": 2480000, "sum": 2480000.0, "min": 2480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 99, "sum": 99.0, "min": 99}}, "EndTime": 1572781661.70883, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 98}, "StartTime": 1572781661.608911}

[11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=249809.6486 records/second
[2019-11-03 11:47:41.810] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 199, "duration": 100, "num_examples": 5, "num_bytes": 79100000}
[11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples
[11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 100 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 501, "sum": 501.0, "min": 501}, "Total Records Seen": {"count": 1, "max": 2505000, "sum": 2505000.0, "min": 2505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 100, "sum": 100.0, "min": 100}}, "EndTime": 1572781661.810584, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 99}, "StartTime": 1572781661.70911}

[11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=246070.664214 records/second
[11/03/2019 11:47:41 INFO 140169171593024] shrinking 100 centers into 10
[11/03/2019 11:47:41 INFO 140169171593024] local kmeans attempt #0. Current mean square distance 12.902647
[11/03/2019 11:47:41 INFO 140169171593024] local kmeans attempt #1. Current mean square distance 11.803318
[11/03/2019 11:47:41 INFO 140169171593024] local kmeans attempt #2. Current mean square distance 12.321064
[11/03/2019 11:47:41 INFO 140169171593024] local kmeans attempt #3. Current mean square distance 12.036984
[11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #4. Current mean square distance 12.555333
[11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #5. Current mean square distance 12.615070
[11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #6. Current mean square distance 11.918087
[11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #7. Current mean square distance 12.279174
[11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #8. Current mean square distance 12.339795
[11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #9. Current mean square distance 12.555266
[11/03/2019 11:47:42 INFO 140169171593024] finished shrinking process. Mean Square Distance = 12
[11/03/2019 11:47:42 INFO 140169171593024] #quality_metric: host=algo-1, train msd <loss>=11.8033180237
[11/03/2019 11:47:42 INFO 140169171593024] batch data loading with context took: 38.6209%, (4.388304 secs)
[11/03/2019 11:47:42 INFO 140169171593024] compute all data-center distances: point norm took: 19.0106%, (2.160087 secs)
[11/03/2019 11:47:42 INFO 140169171593024] gradient: cluster center took: 13.1121%, (1.489863 secs)
[11/03/2019 11:47:42 INFO 140169171593024] compute all data-center distances: inner product took: 9.4443%, (1.073109 secs)
[11/03/2019 11:47:42 INFO 140169171593024] collect from kv store took: 5.5164%, (0.626799 secs)
[11/03/2019 11:47:42 INFO 140169171593024] predict compute msd took: 4.7494%, (0.539646 secs)
[11/03/2019 11:47:42 INFO 140169171593024] gradient: cluster size  took: 3.1338%, (0.356081 secs)
[11/03/2019 11:47:42 INFO 140169171593024] splitting centers key-value pair took: 1.9277%, (0.219037 secs)
[11/03/2019 11:47:42 INFO 140169171593024] compute all data-center distances: center norm took: 1.5278%, (0.173592 secs)
[11/03/2019 11:47:42 INFO 140169171593024] gradient: one_hot took: 1.4084%, (0.160024 secs)
[11/03/2019 11:47:42 INFO 140169171593024] update state and report convergance took: 1.3147%, (0.149378 secs)
[11/03/2019 11:47:42 INFO 140169171593024] update set-up time took: 0.1200%, (0.013640 secs)
[11/03/2019 11:47:42 INFO 140169171593024] predict minus dist took: 0.1141%, (0.012959 secs)
[11/03/2019 11:47:42 INFO 140169171593024] TOTAL took: 11.3625204563
[11/03/2019 11:47:42 INFO 140169171593024] Number of GPUs being used: 0
#metrics {"Metrics": {"finalize.time": {"count": 1, "max": 387.3600959777832, "sum": 387.3600959777832, "min": 387.3600959777832}, "initialize.time": {"count": 1, "max": 42.871952056884766, "sum": 42.871952056884766, "min": 42.871952056884766}, "model.serialize.time": {"count": 1, "max": 0.2219676971435547, "sum": 0.2219676971435547, "min": 0.2219676971435547}, "update.time": {"count": 100, "max": 197.33190536499023, "sum": 11322.939395904541, "min": 97.9759693145752}, "epochs": {"count": 1, "max": 100, "sum": 100.0, "min": 100}, "state.serialize.time": {"count": 1, "max": 0.5171298980712891, "sum": 0.5171298980712891, "min": 0.5171298980712891}, "_shrink.time": {"count": 1, "max": 384.3569755554199, "sum": 384.3569755554199, "min": 384.3569755554199}}, "EndTime": 1572781662.199495, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781650.32371}

[11/03/2019 11:47:42 INFO 140169171593024] Test data is not provided.
#metrics {"Metrics": {"totaltime": {"count": 1, "max": 13017.530918121338, "sum": 13017.530918121338, "min": 13017.530918121338}, "setuptime": {"count": 1, "max": 30.853986740112305, "sum": 30.853986740112305, "min": 30.853986740112305}}, "EndTime": 1572781662.202104, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781662.199603}

[11/03/2019 11:47:41 INFO 140552810366784] shrinking 100 centers into 10
[11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #0. Current mean square distance 12.250052
[11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #1. Current mean square distance 12.186016
[11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #2. Current mean square distance 12.200719
[11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #3. Current mean square distance 11.887745
[11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #4. Current mean square distance 12.341534
[11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #5. Current mean square distance 12.504448
[11/03/2019 11:47:42 INFO 140552810366784] local kmeans attempt #6. Current mean square distance 12.133743
[11/03/2019 11:47:42 INFO 140552810366784] local kmeans attempt #7. Current mean square distance 12.772625
[11/03/2019 11:47:42 INFO 140552810366784] local kmeans attempt #8. Current mean square distance 12.143409
[11/03/2019 11:47:42 INFO 140552810366784] local kmeans attempt #9. Current mean square distance 12.344214
[11/03/2019 11:47:42 INFO 140552810366784] finished shrinking process. Mean Square Distance = 12
[11/03/2019 11:47:42 INFO 140552810366784] #quality_metric: host=algo-2, train msd <loss>=11.8877449036
[11/03/2019 11:47:42 INFO 140552810366784] batch data loading with context took: 31.9681%, (3.320623 secs)
[11/03/2019 11:47:42 INFO 140552810366784] compute all data-center distances: point norm took: 20.7105%, (2.151268 secs)
[11/03/2019 11:47:42 INFO 140552810366784] collect from kv store took: 13.6408%, (1.416910 secs)
[11/03/2019 11:47:42 INFO 140552810366784] gradient: cluster center took: 11.5084%, (1.195417 secs)
[11/03/2019 11:47:42 INFO 140552810366784] compute all data-center distances: inner product took: 9.2459%, (0.960398 secs)
[11/03/2019 11:47:42 INFO 140552810366784] predict compute msd took: 4.4798%, (0.465329 secs)
[11/03/2019 11:47:42 INFO 140552810366784] gradient: cluster size  took: 3.0899%, (0.320962 secs)
[11/03/2019 11:47:42 INFO 140552810366784] gradient: one_hot took: 1.5796%, (0.164074 secs)
[11/03/2019 11:47:42 INFO 140552810366784] update state and report convergance took: 1.2818%, (0.133143 secs)
[11/03/2019 11:47:42 INFO 140552810366784] splitting centers key-value pair took: 1.1349%, (0.117886 secs)
[11/03/2019 11:47:42 INFO 140552810366784] compute all data-center distances: center norm took: 1.1272%, (0.117085 secs)
[11/03/2019 11:47:42 INFO 140552810366784] predict minus dist took: 0.1201%, (0.012476 secs)
[11/03/2019 11:47:42 INFO 140552810366784] update set-up time took: 0.1130%, (0.011741 secs)
[11/03/2019 11:47:42 INFO 140552810366784] TOTAL took: 10.3873124123
[11/03/2019 11:47:42 INFO 140552810366784] Number of GPUs being used: 0
[11/03/2019 11:47:42 INFO 140552810366784] No model is serialized on a non-master node
#metrics {"Metrics": {"finalize.time": {"count": 1, "max": 291.3999557495117, "sum": 291.3999557495117, "min": 291.3999557495117}, "initialize.time": {"count": 1, "max": 41.98312759399414, "sum": 41.98312759399414, "min": 41.98312759399414}, "model.serialize.time": {"count": 1, "max": 0.07700920104980469, "sum": 0.07700920104980469, "min": 0.07700920104980469}, "update.time": {"count": 100, "max": 179.54707145690918, "sum": 10432.80816078186, "min": 89.97201919555664}, "epochs": {"count": 1, "max": 100, "sum": 100.0, "min": 100}, "state.serialize.time": {"count": 1, "max": 0.4820823669433594, "sum": 0.4820823669433594, "min": 0.4820823669433594}, "_shrink.time": {"count": 1, "max": 288.4190082550049, "sum": 288.4190082550049, "min": 288.4190082550049}}, "EndTime": 1572781662.107717, "Dimensions": {"Host": "algo-2", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781650.328628}

[11/03/2019 11:47:42 INFO 140552810366784] Test data is not provided.
#metrics {"Metrics": {"totaltime": {"count": 1, "max": 13907.652139663696, "sum": 13907.652139663696, "min": 13907.652139663696}, "setuptime": {"count": 1, "max": 16.698122024536133, "sum": 16.698122024536133, "min": 16.698122024536133}}, "EndTime": 1572781662.109637, "Dimensions": {"Host": "algo-2", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781662.107824}


2019-11-03 11:47:54 Uploading - Uploading generated training model
2019-11-03 11:47:54 Completed - Training job completed
Training seconds: 142
Billable seconds: 142
CPU times: user 7.93 s, sys: 394 ms, total: 8.33 s
Wall time: 3min 21s
In [9]:
%%time

kmeans_predictor = kmeans.deploy(initial_instance_count=1,
                                 instance_type='ml.m4.xlarge')
--------------------------------------------------------------------------------------------------!CPU times: user 482 ms, sys: 38.7 ms, total: 521 ms
Wall time: 8min 14s
In [10]:
%%time 

result = kmeans_predictor.predict(valid_set[0][0:100])
clusters = [r.label['closest_cluster'].float32_tensor.values[0] for r in result]
CPU times: user 34.1 ms, sys: 353 ┬Ás, total: 34.5 ms
Wall time: 334 ms
In [11]:
for cluster in range(10):
    print('\n\n\nCluster {}:'.format(int(cluster)))
    digits = [ img for l, img in zip(clusters, valid_set[0]) if int(l) == cluster ]
    height = ((len(digits)-1)//5) + 1
    width = 5
    plt.rcParams["figure.figsize"] = (width,height)
    _, subplots = plt.subplots(height, width)
    subplots = numpy.ndarray.flatten(subplots)
    for subplot, image in zip(subplots, digits):
        show_digit(image, subplot=subplot)
    for subplot in subplots[len(digits):]:
        subplot.axis('off')

    plt.show()


Cluster 0:


Cluster 1:


Cluster 2:


Cluster 3:


Cluster 4:


Cluster 5:


Cluster 6:


Cluster 7:


Cluster 8:


Cluster 9:
In [12]:
result = kmeans_predictor.predict(valid_set[0][230:231])
print(result)
[label {
  key: "closest_cluster"
  value {
    float32_tensor {
      values: 4.0
    }
  }
}
label {
  key: "distance_to_cluster"
  value {
    float32_tensor {
      values: 6.309240818023682
    }
  }
}
]
In [13]:
show_digit(valid_set[0][230], 'This is a {}'.format(valid_set[1][230]))
In [ ]: