# Introduction to Data Flow Graphs, Tensors, and TensorFlow

##### 2020-12-30 13:10:14 |

## Tested On

- Linux Ubuntu 20.04
- Windows 10
- macOS Catalina

## Prerequisites

TensorFlow is an open-source deep learning framework developed by Google Brain in 2015 to evaluate, train, build and optimize neural networks and machine learning models. It provides a comprehensive ecosystem of tools for developers, and researchers to build and scale state-of-the-art machine learning applications.

Specifically, TensorFlow specializes in numerical computation through the use of data flow graphs, making it easier to acquire data, train models, serve predictions, and refine future results. And its neural networking capabilities make it ideal for classification, perception, discovery, understanding, creation and prediction.

## What is a Data Flow Graph?

A data flow graph is a structure that describes how data travels through a collection of processing nodes. Each node is a unit of mathematical operation (+, -, x, ÷, etc.), and each connection or edge is the data consumed or produced by the computation. So for example, the function f(x, y) = x^{2} * y + y + 3 can be graphed like so:

As we move through the graph, inputs are passed into the nodes (mathematical operations), producing outputs that also get passed further down the graph. In this graph, the variables x, y, and constant 3 are the connections/edges. In a real-world neural network, this data in usually in the form of a multidimensional array (tensor).

This structure is highly scalable and grow as the computation gets more complex.

## What is a Tensor?

The name TensorFlow is coined from the operations neural networks perform on the core components of multidimensional arrays/tensors. A tensor is a vector of n-dimension that represents all types of data. They are the main objects that are manipulated throughout the program.

In mathematics terms, a tensor is an n-dimensional vector. In simpler terms, it's a data structure that can take the form of a linear array, a 2-dimensional matrix, a 3-dimensional matrix, and so on with as many dimensions as needed. If you're familiar with Python lists, the following examples will make sense to you.

```
# Linear array
numbers = [1, 2, 3, 4, 5]
# 2D array
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
```

## TensorFlow Features

- Portability; runs on various devices and platforms
- Scalability; can scale from a singe CPU to a GPU or clusters of GPUs, to multi nodes TPU infrastructures
- Allows for powerful experimentation
- Has a worldwide community of developers and researchers
- Makes it easy to train and deploy models
- Provides a rich collection of tools for building models eg data preprocessing, model evaluation, and data integration
- Works efficiently with multidimensional arrays with x and y coordinates
- Offers both C++ and Python APIs
- Supports fast debugging
- Provides scalability of computation across machines and large datasets

## TensorFlows Use Cases

- Data clustering
- Regression
- Image classification
- Fraud detection
- Natural Language processing
- Reinforcement learning

## TensorFlow Sessions

Graphs are partially defined computations. When we write code in TensorFlow we create graphs. Graphs alone are just code, and to run a complete operation, it needs to be stored in memory. TensorFlow uses *Sessions* for this purpose, serving as memory to set up the graph/models and pass input through.

## Setting Up a TensorFlow Project

### How to Create Python Project Files with Windows 10 PowerShell 2.0+

```
cd ~
New-Item -ItemType "directory" -Path ".\tensorflow-project"
cd tensorflow-project
virtualenv venv
.\venv\Scripts\activate
```

To verify that the virtual environment is active, make sure (venv) is in the PowerShell command prompt. For example, (venv) PS C:\Users\username\tensorflow-project>

### How to Create Python Project Files with Linux Ubuntu 14.04+ or macOS

```
cd ~
mkdir tensorflow-project
cd tensorflow-project
virtualenv -p python3 venv
source venv/bin/activate
```

To verify that the virtual environment is active, make sure (venv) is in the terminal command prompt.

This will create the following files and folders, and activate the virtual environment.

```
▾ tensorflow-project/
▸ venv/
```

## Installing TensorFlow with Pip

Scikit-learn requires Numpy and Scipy (and dependencies). You also need Python 2.7 and above.

`pip install -U tensorflow==2.4.0`

You can also install tensorflow with conda

`conda install tensorflow==2.4.0`

## Setting Up TensorFlow Docker Containers

```
docker pull tensorflow/tensorflow:latest
docker run -it -p 8888:8888 tensorflow/tensorflow:latest-jupyter
```

The tensorflow official documentation has a detailed guide for installing TensorFlow with virtual environments, docker and other dependencies. You can check them out here.

## How to Create Different Tensors

To use TensorFlow, we first have to import it. Like other libraries for data science and scientific computations, TensorFlow has a popular alias, tf.

```
import tensorflow as tf
print(tf.version)
```

Assigning a variable and creating expressions is a little different from the pythonic way of assigning variables. In TensorFlow, we have to use the alias tf, and the value.

## How to Store and Manipulate Data in TensorFlow

TensorFlow has three prominent ways of storing and manipulating data with the data graphs. To store data with variables that can change within the program, we use tf.variables().

To store data that cannot change and remains constant throughout the program, we use tf.constant(). Add the following code to a file named main.py and execute it with python main.py.

```
import tensorflow as tf
str = tf.Variable("This is a tensorflow string", tf.string)
strr = tf.Variable(1234, tf.int16)
cons1 = tf.constant([1, 2, 3, 4])
cons2 = tf.constant([5, 6, 7, 8])
print(cons1)
print(cons2)
```

In the above code, object types are declared with the variables and constants. We declare the Dtypes objects because we want to control how many bits are used in the precision. A complete list of tf.dtypes.Dtype objects and their precisions can be found here.

You may get the following warnings while running the python main.py command:

```
2020-12-31 15:05:21.179411: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2020-12-31 15:05:21.179525: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-12-31 15:05:22.678353: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-12-31 15:05:22.679027: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-12-31 15:05:22.679051: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2020-12-31 15:05:22.679075: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (x1): /proc/driver/nvidia/version does not exist
2020-12-31 15:05:22.679787: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
```

To resolve this, you will need a CUDA®-enabled card and to follow the TensorFlow GPU Support documentation. Note: You will still be able to complete this tutorial without GPU support. Just ignore the warnings.

Placeholders in TensorFlow allows us to create a storage space for data that will be assigned later. These data is not assigned, but will be initialized by the Session when we call it.

```
import tensorflow as tf
x = tf.placeholder(tf.float32)
y = x + 5
with tf.Session() as sess:
results = sess.run(b, feed_dict={x: 9.0})
print(results)
```

You may get the following warnings while running the python main.py command with TensorFlow 2.0:

```
Traceback (most recent call last):
File "main.py", line 3, in
```
x = tf.placeholder(tf.float32)
AttributeError: module 'tensorflow' has no attribute 'placeholder'

The placeholder() command belongs to the 1.x API. Update your code to import tensorflow.compat.v1 and rerun the program with python main.py.

```
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
x = tf.placeholder(tf.float32)
y = x + 5
with tf.Session() as sess:
results = sess.run(y, feed_dict={x: 9.0})
print(results)
```

When we work with TensorFlow, we have to work with data that is loaded externally from a local file or an image file. If this data is fed into the program locally, it will become too big for the program's memory. The placeholder allows us to handle data that is not available within the program, but will be initialized later on.

To populate a placeholder, we use the feed_dict() method.

## Running a Basic Program with TensorFlow

```
import tensorflow as tf
# Initializing two constants
cons1 = tf.constant([11, 12, 13, 14])
cons2 = tf.constant([15, 16, 17, 18])
# Multiplying the two arrays
results = tf.multiply(cons1, cons2)
# Printing the result
print(results)
```

This is a simple program to multiply two constants. Instead of an output, TensorFlow just returns this:

`tf.Tensor([165 192 221 252], shape=(4,), dtype=int32)`

This program doesn't initialize unless we execute it using a session.

```
import tensorflow as tf
# Initialize two constants
cons1 = tf.constant([11, 12, 13, 14])
cons2 = tf.constant([15, 16, 17, 18])
# Multiply
results = tf.multiply(cons1, cons2)
# Initializing the Session
sess = tf.Session()
# Printing the result
print(sess.run(results))
# Close the session
sess.close()
```

You may get the following warnings while running the python main.py command with TensorFlow 2.0:

```
Traceback (most recent call last):
File "main.py", line 11, in
```
sess = tf.Session()
AttributeError: module 'tensorflow' has no attribute 'Session'

The Session() command belongs to the 1.x API. Update your code to import tensorflow.compat.v1 and rerun the program with python main.py.

```
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
# Initialize two constants
cons1 = tf.constant([11, 12, 13, 14])
cons2 = tf.constant([15, 16, 17, 18])
# Multiply
results = tf.multiply(cons1, cons2)
# Initializing the Session
sess = tf.Session()
# Printing the result
print(sess.run(results))
# Close the session
sess.close()
```

## Shapes and Ranks in TensorFlow

Ranks in TensorFlow simply means the number of dimensions in a tensor.

```
rank0 = tf.Variable(['Test'], tf.string)
rank1 = tf.Variable(['Test', 'Me'], tf.string)
rank2 = tf.Variable(['Test', 'Ongoing'], ['Write', 'Yeah'])
```

The rank0 variable has only one element and that is why its rank is zero. The rank1 variable has only one dimension. It contains a single array and so it has a rank of one. It can be called a scalar value. rank2 is a nested list.

```
rank3 = tf.constant([[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]])
tf.rank(rank3) # 3
```

rank3 has one nested list and two other single arrays and that makes it 3.

Shape is the number of elements that exist in each dimension. For rank3, we have a shape of 3.

`tf.Tensor: shape = (3, ), dtype=int32, numpy = array([2,2,3], dtype=int32)`

tf.shape returns a 1-D integer tensor representing the shape of input. For a scalar input, the tensor returned has a shape of (0,) and its value is the empty vector - []. The shape is determined by the number of elements in each array block and the number of arrays.

## Conclusion

TensorFlow's popularity is largely due to its simplicity when dealing with machine learning algorithms. It is currently an ideal choice for developers and researchers. This is a simple look at TensorFlow, Its basic data structures and how it holds data.

Shapes and ranks are important aspects of TensorFlow because tensors or graphs nodes are vectors of n-dimensions representing the data. The ranks and shapes tell us the shape that these datagraphs take and their hierarchy.

If you're interested in learning about data visualization, take our Real World Data Science with Python course. This course teaches you how to programmatically generate graphs and charts from existing datasets. You'll also learn Python fundamentals, and how to utilize frameworks like Matplotlib, Pandas, Numpy, and Seaborn.

## Comments

You must log in to comment. Don't have an account? Sign up for free.

Subscribe to comments for this post

## Info