Tensorflow Quantization

Here i have illustrated and compared speed and size affeact on quantization and model reduction.

Layer Quantization

It is Post-training quantization conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, with little degradation in model accuracy.

S No.	Name	Train Accuracy	Test Accuracy	Model Size
1	Float Model	0.9966	0.9800	84.8KB
2	Quantized Model	0.9966	0.9732	23.1KB

Model Quantization

Quantization works by reducing the precision of the numbers used to represent a model's parameters, which by default are 32-bit floating point numbers. This results in a smaller model size and faster computation.

This was test on i5 8600K and nivida gtx 1080.

S No.	Name	Test Accuracy	Speed	Model Size
1	Normal Model	0.9823	0.3968 secs	272.8KB
2	Quantized Model	0.9816	1.366 secs	23.4KB
2	TFLite Model	0.9823	0.8510 secs	83.4KB

Speed depends on architecture and devices.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
layer_quantize		layer_quantize
quantize_model		quantize_model
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tensorflow Quantization

Layer Quantization

Model Quantization

About

Languages

License

ratansingh98/tf_quantization

Folders and files

Latest commit

History

Repository files navigation

Tensorflow Quantization

Layer Quantization

Model Quantization

About

Topics

Resources

License

Stars

Watchers

Forks

Languages